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OPTIMIZED NETWORK RESOURCE balanced among those servers. Domain name-server-based 

LUCA1 ION round-robin address resolution causes different clients to be 

FIELD OF THE INVENTION ^TV* "f™ 

... Another solution, load balancing, takes into account the 

This invention relates to replication of resources in com- load at caCD server (measured in a variety of ways) to select 

putcr networks. which server should handle a particular request. 

BACKGROUND OP THP iNFWNrnnM L ° ad balancers a variet y °f techniques to route the 

BACKGROUND OF THE INVENTION request to the appropriate server. Most of those load- 

The advent of global computer networks, such as the 10 baJancm g techniques require that each server be an exact 
Internet, have led to entirely new and different ways to replica °f 'he primary Web site. Load balancers do not take 
obtain information. A user of the Internet can now access 11110 accoum lne "network distance" between the client and 
information from anywhere in the world, with no regard for cand idate mirror servers. 

the actual location of either the user or the information. A Assuming that client protocols cannot easily change, there 
user can obtain information simply by knowing a network is arc two ma j° r problems in the deployment of replicated 
address for the information and providing that address to an resources. The first is how to select which copy of the 
appropriate application program such as a network browser. resource to use. That is, when a request for a resource is 
The rapid growth in popularity of the Internet has m r ade t0 a server » how should the choice of a replica 

imposed a heavy traffic burden on the entire network. ofthe server (or of that data) be made. We caU this problem 
Solutions to problems of demand (e.g., better accessibility 20 me "rendezvous problem". There are a number of ways to 
and faster communication links) only increase the strain on get clients t0 rendezvous at distant mirror servers. These 
the supply. Internet Web sites (referred to here as technologies, like load balancers, must route a request to an 
"publishers") must handle ever-increasing bandwidth needs, a PPropriate server, but unlike load balancers, they take 
accommodate dynamic changes in load, and improve per- network performance and topology into account in making 
formance for distant browsing clients, especially those over- 25 me determination. 

seas. The adoption of content-rich applications, such as live A number of companies offer products which improve 
audio and video, has further exacerbated the problem. network performance by prioritizing and filtering network 

To address basic bandwidth growth needs, a Web pub- ba & c - Pr °xy caches provide a way for client aggregators to 
Usher typically subscribes to additional bandwidth from an red uce network resource consumption by storing copies of 
Internet service provider (ISP), whether in the form of larger 30 P°P ular resources close to the end users. A client aggregator 
or additional "pipes" or channels from the ISP to the K an InterD6t service provider or other organization that 
publisher's premises, or in the form of large bandwidth bnngs a large nurrj ber of clients operating browsers to the 
commitments in an ISP's remote hosting server collection. ln ternet. Client aggregators may use proxy caches to reduce 
These increments are not always as fine-grained as the bandwidth required to serve web content to these brows- 

publisher needs, and quite often lead times can cause the 35 6rs ' Howevcr . traditional proxy caches are operated on 
publisher's Web site capacity to lag behind demand. behalf of Web clients rather than Web publishers. 

To address more serious bandwidth growth problems, Pr . oxy cacne s store the most popular resources from all 

publishers may develop more complex and costly custom publishers, which means they must be very large to achieve 
solutions. The solution to the most common need, increasing reasonable cache efficiency. (The efficiency of a cache is 
capacity, is generally based on replication of hardware 40 defined & ,he number of requests for resources which are 
resources and site content (known as mirroring), and dupli- alre ady cached divided by the total number of requests.) 
cation of bandwidth resources. These solutions, however, Proxy caches depend on cache control hints delivered 

are difficult and expensive to deploy and operate. As a result, resources to determine when the resources should be 

only the largest publishers can afford them, since only those replaced. These hints are predictive, and are necessarily 
publishers can amortize the costs over many customers (and 45 often incorrect, so proxy caches frequently serve stale data. 
Web site hits). In many cases, proxy cache operators instruct their proxy to 

A number of solutions have been developed to advance ignore hints in order to make the cache more efficient, even 
replication and mirroring. In general, these technologies are though this causes it to more frequently serve stale data, 
designed for use by a single Web site and do not include J0 Pr °xy caches hide the activity of clients from publishers, 
features that allow their components to be shared by many °nce a resource is cached, the publisher has no way of 
Web sites simultaneously. knowing how often it was accessed from the cache. 

Some solution mechanisms offer replication software that o.,.„w „ 

helps keep mirrored servers up-to-date. These mechanisms SUMMARY OFJTlE^NVENTlON 

generally operate by making a complete copy of a file 55 This invention providesTwaylor serversln a computer 3 

system. One such system operates by transparently keeping .netwaTlooSTOtriewpiS^ai of requests for selected f 

multiple copies of a file system in synch. Another system )^^^^^^^6m^t€^t^i^^x4-, 

provides mechanisms for explicitly and regularly copying L ^^SS^3^&p»uAx^^i^a^t can be/ 

hies that have changed. Database systems are particularly madeTJynamcallyrbased on information about possible 

difficult to replicate, as they are continually changing. Sev- 60 repeaters. 

eral mechanisms aUow -for replication of databases, although If a requested resource contains references to other 

&«r,i^™r % a rT liShinS re - u --.-rneoraUofuiesereferer K esc,nbeTeplacX 

Several companies offering proxy caches describe them as references to repeaters, 
replication tools. However, proxy caches differ because thev a j- , • 

are operated on behalf of client! ra.heT.han pubtene^s « A <*°rd,ngJy, in one aspec, this invention is a method of 

rw» „ u/ u •, • ^ u ... puDiisners. 65 processing resource requests in a computer network. First a 

is to l^ ^l?Z A " ,lP,e T*? 8 C u baUeDge CUem makeS 4 re < Ues « for a P«*»k resource torn 
is to ensure that the load * appropriately distributed or origin server, the request including a resource identifier for 
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the particular resource, the resource identifier sometimes 
including an indication of the origin server. Requests arriv- 
ing at the origin server do not always include an indication 
of the origin server, since they are sent to the origin server, 
they do not need to name it. A mechanism referred to as a 
reflector, co-located with the origin server, intercepts the 
request from the client to the origin server and decides 
whether to reflect the request or to handle it locally. If the 
reflector decides to handle the request locally, it forwards it 
to the origin server, otherwise it selects a "best" repeater to 
process the request. If the request is reflected, the client is 
provided with a modified resource identifier designating the 
repeater. 

The client gets the modified resource identifier from the 
reflector and makes a request for the particular resource 
from the repeater designated in the modified resource iden- 
tifier^ 

When the repeater gets the client's request, it.respondsby 
returning.Oie.requested.resource.to.the.client. i f the repeate r 
has a local copy of the resource then it rem*rnsnRTr%opY: 



10 



lowing detailed description, taken in conjunction with the 
accompanying drawings, in which the reference characters 
refer to like parts throughout and in which: 

FIG. 1 depicts a portion of a network environment accord- 
ing to the present invention; and 

FIGS. 2-6 are flow charts of the operation of the present 
invention. 

DETAILED DESCRIPTION OF THE 
PRESENTLY PREFERRED EXEMPLARY 
EMBODIMENTS 



-The selection'by the-reflector of an appropriate repeater to 
handle the request can be done in a number of ways. In the 
preferred embodiment, it is done by first pre-partitioning the 
network into "cost groups" and then determining which cost 
group the client is in. Next, from a plurality of repeaters in 
the network, a set of repeaters is selected, the members of 
the set having a low cost relative to the cost group which the 
client is in. In order to determine the lowest cost, a table is 
maintained and regularly updated to define the cost between 
each group and each repeater. Then one member of the set 
is selected, preferably randomly, as the best repeater. 

If the particular requested resource itself can contain 
identifiers of other resources, then the resource may be 
rewritten (before being provided to the client). In particular, 
the resource is rewritten to replace at least some of the 
resource identifiers contained therein with modified resource 
identifiers designating a repeater instead of the origin server. 
As a consequence of this rewriting process, when the client 
requests other resources based on identifiers in the particular 
requested resource, the client will make those requests 
directly to the selected repeater, bypassing the reflector and 
origin server entirely. 

Resource rewriting must be performed by reflectors. It 
may also be performed by repeaters, in the situation where 
repeaters "peer" with one another and make copies of 
resources which include rewritten resource identifiers that 
designate a repeater. 

In a preferred embodiment, the network is the Internet and 
the resource identifier is a uniform resource locator (URL) 
for designating resources on the Internet, and the modified 
resource identifier is a URL designating the repeater and 
indicating the origin server (as described in step B3 below), 
and the modified resource identifier is provided to the client 
using a REDIRECT message. Note, only when the reflector 
is "reflecting" a request is the modified resource identifier 
provided using a REDIRECT message. 

In another aspect, this invention is a computer network 
comprising a plurality of origin servers, at least some of the 
origin servers having reflectors associated therewith, and a 
plurality of repeaters. 

BRIEF DESCRIPTION OF THE DRAWINGS 
The above and other objects and advantages of the 
invention will be apparent upon consideration of the fol- 



Overview 

FIG. 1 shows a portion of a network environment 100 
according to the present invention, wherein a mechanism 
(reflector 108, described in detail below) at a server (herein 
origin server 102) maintains and keeps track of a number of 
partially replicated servers or repeaters 104a, 1046, and 
104c. Each repeater 104a, 1046, and 104c replicates some or 
all of the information available on the origin server 102 as 
p well as information available on other origin servers in the 
network 100. Reflector 108 is connected to a particular 
repeater known as its "contact" repeater ("Repeater B" 1046 
in the system depicted in FIG. 1). Preferably each reflector 
maintains a connection with a single repeater known as its 
contact, and each repeater maintains a connection with a 
special repeater known as its master repeater (e.g., repeater 
104m for repeaters 104a, 1046 and 104c in FIG. 1). 

Thus, a repeater can be considered as a dedicated proxy 
server that maintains a partial or sparse mirror of the origin 
server 102, by implementing a distributed coherent cache of 
the origin server. A repeater may maintain a (partial) mirror 
of more than one origin server. In some embodiments, the 
network 100 is the Internet and repeaters mirror selected 
resources provided by origin servers in response to clients' 
HTTP hypertext transfer protocol) and FTP (file transfer 
protocol) requests. 

A client 106 connects, via the network 100, to origin 
server 102 and possibly to one or more repeaters 104a etc. 

Origin server 102 is a server at which resources originate. 
More generally, the origin server 102 is any process or 
collection of processes that provide resources in response to 
requests from a client 106. Origin server 102 can be any 
4S off-the-shelf Web server. In a preferred embodiment, origin 
server 102 is typically a Web server such as the Apache 
server or Netscape Communications Corporation's Enter- 
prise™ server. 

Client 106 is a processor requesting resources from origin 
server 102 on behalf of an end user. The client 106 is 
typically a user agent (e.g., a Web browser such as Netscape 
Communications Corporation's Navigator™) or a proxy for 
a user agent. Components other than the reflector 108 and 
the repeaters 104a, 1046, etc., may be implemented using 
commonly available software programs. In particular, this 
invention works with any HTTP client (e.g., a Web browser), 
proxy cache, and Web server. In addition, the reflector 108 
might be fully integrated into the data server 112 (for 
instance, in a Web Server). These components might be 
loosely integrated based on the use of extension mechanisms 
(such as so-called add-in modules) or tightly integrated by 
modifying the service component specifically to support the 
repeaters. 

Resources originating at the origin server 102 may be 
static or dynamic. That is, the resources may be fixed or they 
may be created by the origin server 102 specifically in 
response to a request. Note that the terms "static" and 
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a? y so D r "n^SE-^iiSS reS0UICe ChaDge J* " M * " ,hC f - ,° f "* d °» (*« ^ » 

6 , aiDcu long in erval. the same document), eic. In general, a browser program will 

Resource requests from the client 106 to the origin server AU in missing parts of a URL using the corresponding parts 
102 are intercepted by reflector 108 which for a given from the current document, thereby forming a fully formed 
request either forwards the request on to the origin server 5 URL including a fully qualified domain name etc 

etc. m the network 100. That is, depending on the nature of other documents, and each of those other documents mav be 

fl T e i S ™ y • f CheD ' 10 , 6 * lhe ° rigin m - 0,6 ° n ■ differen ' <*™r in a differ. pTrt of 2 worW f5 

server 102), or selects one of the repeaters (preferably the io Russia, Africa, China and Australia A user viewing V?, 

SL ,T ' ° f , thC ^ b> thC rcqUeS * 10 th6 docilnlenl « a Particular client can folbw^ny SiS 

The notion of a best repeater and the manner in which the irci * n u u I M B"»ucam. 

be^cpe^Ue^sel^ m ^t^T^I^J r° U °T g f ° rm ( f M ^ 

fRWa.Pr* tna„ mi), „7 T- — * J Be mers-LeeetaL Uniform Resource Locators (URL), 

T^'Eft;^.. 1 ^. intermediate : processors^ ■ f Network Working Group, Request for Comments- 1738 
^^^.^^^^^I^y ^P^^BPifeifa Category: Standards Track, December V 9 94 lta tt d 
^^^^^^ t ' &S ^ b ^W "h«p://ds.intcmic.net/rfc/rfcl738.txi- f which thereby 
W hin repeaters-lO^-lO^etc^are^anyzprocessesZor!: incorporated herein by reference)- V 
collections-of-processes-that-deliver resources to the client f J J- 

W^«M^K!!yf J where "scheme" can be a symbol such as "file" (for a file on 
U^an^onswuh the ongin server 102. f As me local svstcm) _ « ftp „ (fof a ffle Qn an ^ °° 

The reflector 108 is a mechanism, preferably a software server), "http" (for a file on a file on a Web server), and 

program, that intercepts requests that would normally be "telnet" (for a connection to a Telnet-based service). Other 

sent directly to the origin server 102. While shown in the schemes, can also be used and new schemes are added every 

drawings as separate components, the reflector 108 and the DOW and then. The port number is optional, the system 

origin server 102 are typically co-located, e.g., on a partial- 30 substituting a default port number (depending on the 

lar system such as data server 112. (As discussed below, the scheme) if none is provided. The "host" field maps to a 

reflector 108 may even be a "plug in" module that becomes particular network address for a particular computer The 

part of the origin server 102. "url-path" is relative to the computer specified in the "host" 

FIG. 1 shows only a part of a network 100 according to field - A urI -P at h is typically, but not necessarily, the path- 

this invention. Acomplete operating network consists of any 35 name of 3 file in a web server directory, 

number of clients, repeaters, reflectors, and origin servers. For exa mple, the following is a URL identifying a file "P" 

Reflectors communicate with the repeater network, and ^ ^ P atn "A/B/C on a computer at "www.uspto.gov": 

repeaters in the network communicate with one another. http://www.uspto.gOv/A/B/c/F 

Uniform Resource Locators . 

Each location in a computer network has an address *° th °Zt' iwf^ ^ *** 7" ^ ^^^^^ 

which can generally be specified as a series of names^r " ^ r PT0& T (e,g - 3 br ° WSer) nJnnin S 00 a 

numbers. In order to access^ informahon.TaddreX ha T/J^" < l e V. a chenl °° m P uter ) *™ to firs, 

information must be known. For example, 0^ Worid ^ T*T ^ * ^ C ° mpUter) SpeCffied by the 

Wide Web ("the Web") which is a subset of the Internet the « ' S WOUld have t0 locate the * ™ er 

has been standardized into Uniform Resource Locators Name Server (DNS), providing the DNS with the host name 
(URLs). URLs specify the 10^,^ 0 resource" ( .^f ° g°v ). The DNS acts as a kind of centralized 
(information, data fiL, £ on the network. t^^J^^S^S 

hy^^mL^e ^hy^exTd^S iste 5 ° To^* ^ " * 

s T£ y ™j ollow ci,e Unks ,o read ^rr ope r a conneciion ,o HnT ™ <** 1 

. ,. „ „ , , , server) on the remote computer www.uspto.eov" and uses 

Weh Ln T n h merae ' m ge 11 ral the WOfld WidC me CODDeclion 10 ^ * »^age fo the remote 

Web specfically, documents can be created using a stan- 60 computer (using the HTTP scheme). The message isTyp 

?H™n aS H' be HyPer,6Xt Mark "/ UDgUage Ca " y an HT11> GET re ^ uesl ^ich incudes hTS-pa th of 

(HTML). In HTML, a document consists of data (text, the requested resource, "A/B/C/F" The HTTP se^e 

rmages, sounds, and the like), including links to other receives the request and uses ft to ' acc<L thV re^ 

sections of the same document or to other documents. The specified by the url-path "A/B/C/F" TbTs^er renTr^T 
hnks are generally provided as URLs, and can be in relative 65 resource over the same connection ' 
on absolute ^fora Relative URU simply omit the parts of the Thus, conventionally HTTP client requests for Web 

URL which are the same as for the document including the resources at an origin server 102 are prc^sseTas follows 
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(see FIG. 2) (This is a description of the process when no 
reflector 108 is installed.): 

Al. A browser (e.g., Netscape's Navigator) at the client 
receives a resource identifier (i.e., a URL) from a user. 

A2. The browser extracts the host (origin server) name 5 
from the resource identifier, and uses a domain name 
server (DNS) to look up the network (IP) address of the 
corresponding server. The browser also extracts a port 
number, if one is present, or uses a default port number 
(the default port number for http requests is 80). 10 

A3. The browser uses the server's network address and 
port number to establish a connection between the 
client 106 and the host or origin server 102. 

A4. The client 106 then sends a (GET) request over the 
connection identifying the requested resource. 

A5. The origin server 102 receives the request and 

A6. locates or composes the corresponding resource. 

A7. The origin server 102 then sends back to the client 
106 a reply containing the requested resource (or some 2 o 
form of error indicator if the resource is unavailable). 
The reply is sent to the client over the same connection 
as that on which the request was received from the 
client. 

A8. The client 106 receives the reply from the origin 25 
server 102. 

There are many variations of this basic model. For 
example, in one variation, instead of providing the client 
with the resource, the origin server can tell the client to 
re-request the resource by another name. To do so, in A7 the 30 
server 102 sends back to the client 106 a reply called a 
"REDIRECT" which contains a new URL indicating the 
other name. The client 106 then repeats the entire sequence, 
normally without any user intervention, this time requesting 
the resource identified by the new URL. 35 

System Operation 

In this invention reflector 108 effectively takes the place 
of an ordinary Web server or origin server 102. The reflector 
108 does this by taking over the origin server's IP address 
and port number. In this way, when a client tries to connect 40 
to the origin server 102, it will actually connect to the 
reflector 108. The original Web server (or origin server 102) 
must then accept requests at a different network (IP) address, 
or at the same IP address but on a different port number. 
Thus, using this invention, the server referred to in A3-A7 45 
above is actually a reflector 108. 

Note that it is also possible to leave the origin server's 
network address as it is and to let the reflect or run at a 
different_address_or_on_a_diflerent_port.' In this way the/ 
/reflector does not intercept requests sent to the origin serverj^o 
/but'can still be sent requests addressed specifically 10 th£y 
7refleaor|fTbus~ thT^steW - c¥n~b"e~tested~and-configTire(l 
withounnterrupting its normal operation. The reflector 108 
supports the processing as follows (see FIG. 3): upon receipt 
of a request, 55 
Bl. The reflector 108 analyzes the request to determine 
whether or not to reflect the request. To do this, first the 
reflector determines whether the sender (client 106) is 
a browser or a repeater. Requests issued by repeaters 
must .be served locall y by the orig in server lOZ jThis' 60 
.determination can be made by looking up the network 1 ? 
'(IP) address of the sender in a list of known repeater^? 

lion could be made by attaching information to a 
request to indicate that the request is from a specific 65 
repeater, or repeaters can request resources from a 
special port other than the one used for ordinary clients. 



8 



B2..If the request is.not from.a.repeater. the reflector look*? 
[up the requested resource in a table (called the "rule 
base") to determine whether the resource requested is 
|'repeatable". Based on this determination, the reflector 
( either reflects the request (B3, described below) 0" 
serves the req uest locall y ( B4, describ ed below)", y ' 
The rule base is a list of regular expressions and"associ- 
ated attributes. (Regular expressions are well-known in the 
field of computer science. A small bibliography of their use 
is found in Abo, el al., "Compilers, Principles, techniques 
and_tcols",-Addison7Wesley ) _19^ 
■resource-identifier (URL) for a given request is looked upficff 



'the rule base by matching it sequentially with each regula¥, i> 
T expression. The first match identities the attributes rnr \Ke? 
'■resource, namely repealable'or local. If there is no matcr/iif 
the rule base,' a default attribute is used. Each reflector hll 
its own rule base, which is manually "configured by tSk: 

reflector operate^ 1 ~ ' . 

B3. To reflect a request,.(to-serve-a-request-locaUy-go-to* 
B4), asjshownjnpIG. 4, the reflector determines (B3-1) [2 
the best repeater to reflect the request to, as described *4 
in detail below. The reflector then creates (B3-2) a newjjU 

C resource identifier (URL) (using the' requested URE'aI 
and'the' best repeater) that identifies the same re^urcg * 
attthe selected repeateFT 
it is^necessary^triafthe reflection step create a single URL 
containing the URL of the original resource, as well as the 
identity of the selected repeater. A special form of URL is 
created to provide this information. This is done by creating 
a new URL as follows: 

Dl. Given a repeater name, scheme, origin server name 
and path, create a new URL. If the scheme is "http", the 
preferred embodiment uses the following format: 

http://<repeater>/<server>/<path> 

If the form used is other than "http", the preferred embodi- 
ment uses the following format: 

http://<repeater>/<server>@proxy=<scheme>@/<path> 

The reflector can also attach a MIME type to the request, to 
cause the repeater to provide that MIME type with the result. 
This is useful because many protocols (such as FTP) do not 
provide a way to attach a MIME type to a resource. The 
format is 

http://<repeater>/<server>@proxy-<scheme>:<typc>@/<path> 

This URL is interpreted when received by the repeater. 
The reflector then sends (B3-3) a REDIRECT reply 

containing this new URL to the requesting client. The HTTP 

REDIRECT command allows the reflector to send the 

browser a single URL to retry the request. . 

B4. To serve a request locally, the request is sent by the 
reflector to ("forwarded to") the origin server 102. In 
this mode, the reflector acts as a reverse proxy server. 
The origin server 102 processes the request in the 
normal manner (A5-A7). The reflector then obtains the 
origin server's reply to the request which it inspects to 
determine if the requested resource is an HTML 
document, i.e., whether the requested resource is one 
which itself contains resource identifiers. 
B5. If the resource is an HTML document then the 
reflector rewrites the HTML document by modifying 
resource identifiers (URLs) within it, as described 
below. The resource, possibly as modified by rewriting, 
is then returned in a reply to the requesting client 106. 
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[f Ihe requesting client is a repeater, the reflector may 
temporarily disable any cache-control modifiers which the 
origin server attached to the reply. These disabled cache- 
control modifiers are later re-cnabled when the content is 
served from the repeater. This mechanism makes it possible 
for the origin server to prevent resources from being cached 
at normal proxy caches, without affecting the behavior of the 
cache at the repeater. 

B6. Whether the request is reflected or handled locally, 
details about the transaction, such as the current time, 
the address of the requester, the URL requested, and the 
type of response generated, are written by the reflector 
to a local log file. 
By using a rule base (B2), it is possible to selectively 
reflect resources. There are a number of reasons that certain 
particular resources cannot be effectively repeated (and 
therefore should not be reflected), for instance: 

the resource is composed uniquely for each request; 
the resource relies on a so-called cookie (browsers will 
not send cookies to repeaters with different domain 
names); 

the resource is actually a program (such as a Java applet) 
that will run on the client and that wishes to connect to 
a service (Java requires that the service be running on 
the same machine that provided the applet). 
In addition, the reflector 108 can be configured so that 
requests from certain network addresses (e.g., requests from 
clients on the same local.area.netw.ork_as_the_reflector.itself)^ 
are never reflected .! Also, the reflector may, choose..not to? 
reflect requests because the -re flee tor is exceeding its cgm-7 30 
/mitted 'aggr'egate information rate, as describe d below^— ' 
— AT:eqiIe1srwrIich is reflected is automatically mirrored at 
the repeater when the repeater receives and processes the 
request. 

The combination of the reflection process described here 
and the caching process described below effectively creates 
a system in which repeatable resources are migrated to and 
mirrored at the selected reflector, while non-repeatable 
resources are not mirrored. 

Alternate Approach 

Placing the origin server name in the reflected URL is 
generally a good strategy, but it may be considered unde- 
sirable for aesthetic or (in the case, e.g., of cookies) certain 
technical reasons. 

It is possible to avoid the need for placing both the 
repeater name and the server name in the URL. Instead, a 
"family" of names may be created for a given origin server, 
each name identifying one of the repeaters used by that 
/server; 

ff For instance, if www.example.com is the origin server, 

/ / names for three repeaters might be created: 
/ / wrl.example.com 
[ I wr2.example.com 
/ / wr3.example.com 

/ / The name "wrl.example.com" would be an alias for 
^^=repeater 1, which might also be known by other names such 
as "wrl.anotherExample.com" and "wrl.example.edu". 

If the repeater can determine by which name it was 
addressed, it can use this information (along with a table that 
associates repeater alias names with origin server names) to 
determine which origin server is being addressed. For 
instance, if repeater 1 is addressed as wrl.example.com, 
then the origin server is "www.example.com"; if it is 
addressed as "wrl.anotherExample.com", then the origin 
server is "www.anotherExample.com". 

The repeater can use two mechanisms to determine by 
which alias it is addressed: 



Each alias can be associated with a different IP address. 
Unfortunately, this solution does not scale well, as IP 
addresses are currently scarce, and the number of IP 
addresses required grows as the product of origin 
servers and repeaters. 

The repeater can attempt to determine the alias name 
used by inspecting the "host:" tag in the HTTP header 
of the request. Unfortunately, some old browsers still in 
use do not attach the "host:" tag to a request. Reflectors 
would need to identify such browsers (the browser 
identity is a part of each request) and avoid this form of 
reflection. 
How a Repeater Handles a Request 
When a browser-reccivcs-a-REDIREGr-response_(as_ 
15 p roduced in B3 ), it reissues a request for the, resource using/ 
jthe new resource identifier (URL) (A1-A5). Because the: 
new identifier refers to a repeater instead of the origin server, 
the browser' npw~ sends a request for the resource- to the**-;/ 
repeater which'pro cesses a request as follows, with refereng 

M to-Fia5: f ' -— Z _ 

ClTjFirst the repeater analyzes the request to determine 
the network, address of the requesting client and the 
/path of the resource requested. Included-in the-path is-' 
an-origin-server name-(as"described above with refer- 
ence to B3). 

C2. The repeater uses an internal table to verify that the 
origin server belongs to a known "subscriber". A sub- 
scriber is an entity (e.g., a company) that publishes 
resources (e.g., files) via one or more origin servers. 
When the entity subscribes, it is permitted to utilize the 
repeater network. The subscriber tables described 
below include the information that is used to link 
reflectors to subscribers. 
If the request is not for a resource from a known 
subscriber, the request is rejected. To reject a request, the 
repeater returns a reply indicating that the requested 
resource does not exist. 

C3. The repeater then determineswhetherjhe req uested 
resource-is(c^he^UckfaTlyrIf'the requested resource isj 
m^herepeaWr'scachelt'isretrieved. iOnmeolher handy 
^if a' valid copy of the requested resource is not in the^ 
repeater's. cache, the repeater modifies the incoming^ 
URL, creating a request -that "it issues directly to. lhe£ 
originating reflector whi ch processes it ( as in Bl-B6)|r 
BecausTtKis request to the originating reflectbTis"from 
a repeater, the reflector always returns the requested 
resource rather than reflecting the request. (Recall that 
reflectors always handle requests from repeaters 
locally.) If the repeater obtained-the-resource-from_the 
the repeater .then caches the resourcey 
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50 



origin-server, 
| locally. J 



55 



r ' 7w 

IfXresource is not cached locally ,' the cache can quer y its/ 
ppeer caches" &> see if one of them contains the resource, 
^before-or-at-the' same time as requesting the resource from 
the reflector/origin server. If a peer cache responds posi- 
tively in a limited period of time (preferably a small fraction 
of a second), the resource will be retrieved from the peer 
cache. 

C4. The repeater then constructs a reply including the 
requested resource (which was retrieved from the cache 
or from the origin server) and sends that reply to the 
requesting client. 

C5. Details about the transaction, such as the associated 
reflector, the current time, the address of the requester, 
the URL requested, and the type of response generated, 
are written to a local log file at the repeater. 



08/25/2003,. EAST Version: 1.04.0000 



US 6,185,598 Bl 



11 



12 



RRPT capacity generally indicates a faster processorjfL 
whereas a higher BSPT capacity generally indicates a wider- ?? 
network pipe. |This~fornrof load"measurement"a^ujDes"thal^ 
a-given:server-is-dedicated-to-the-task.of.repeating.. 



Note thai the bottom row of FIG. 2 refers to an origin 
server, or a reflector, or a repeater, depending on what the 
URL.in step Al i dentifi es 

(Selecting the Best Repeater?/ _ = ) 

4f-the-reflector=108=determines-that-it-wUl-reflecl-the-- ) s /-Each repeater regularlycalcuiates its"currenfRRPT and I 
reguest,_it must then select the best repeater to handle that .X JbSPT, by. accumulating the number of requests received and " 

/ bytes sent over a short time uiteryal. These measurements 
are 'used to;determirie the repeater's-load-in-each of these'^? 
^dimensions' If a repeater's* load exceeds its configur'ecly 
10 capacity,' an alar m message is sent to the repealer network 
J administrator. ^ ~ 
— The-rwo"current load components are combined into a 



[requests (as. referred Jo in step- B3-l). This selection- is 
. J-performeifby'lhe Best Repeater Selector (BRS) mechanism^ 
I described here: 

I — i 



"The'gdal of the BRS is to select, quickly and heuristically, 
an appropriate repeater for a given client given only the 
network^dress.of.theclient.-An-appropriaterepeater.is.one__ 
which'is not too heavily loaded and which is not too far from 7 
the client in terms of some measure of network distance. The 
mechanism - used~here relies~oh~ specific~compact, pre- 55 
computed data to make a fast decision. Other, dynamic 
solutions can also be used to select an appropriate repeater. 

The BRS relies on three pre-computed tables, namely the 
Group Reduction Table, the Link Cost Table, and the Load 
Table. These three tables (described below) are computed 2 o 
off-line and downloaded to each reflector by its contact in 
the repeater network. 

The Group Reduction Table places every network address 
into a group, with the goal that addresses in a group share 



single value indicating overall current load. Similarly, the 
two maximum capacity components are combined into a 
single value indicating overall maximum capacity. The 
components are combined as follows: 

currcnt-load=Bxcurrcnt RRPT+(1-B)xcurrcnt BSPT 
max-load-Bxmax RRPT+{1-B)xmax BSPT 



The factor B, a value between 0 and 1, allows the relative 
weights of RRPT and BSPT to be adjusted, which favors 

consideration-of-either-processing-power-or-bandwidth. 

The overall current load and overall maximum capacity?"") 
relative costs, so that they would have the same best repeater 25 values 'are p'eriodicallysent from each repeater to the masteWl. 
under varying conditions (i.e., the BRS is invariant over the /repeater," where they are aggregated in the Load fable.Va*' . 
members of the group). "*-••• '•'*• • •- ••• ~ 

The Link Cost Table is a two dimensional matrix which 
specifies the current cost between each repeater and each 

group. Initially, the link cost between a repeater and a group 30 1 Whiielhe preferred embodimentus es a two-dimension al 
is defined as the "normalized link cost" between the repeater measure of repeater load,' any other measure of load can be 
and the group, as defined below. Over time, the table will be ' ' 

updated with measurements which more accurately reflect 
the relative cost of transmitting a file between the repeater 



table summarizing the overall load for all : repeaters. Change's^ 
t m . tne- Load table are "distributed automatically to eac fiX 



S1U UJV. 
reflecto/.- 
I I IL. I . T 



I used. | 



Combining Link Costs and Load 
The BRS computes the cost of servicing a given client 
and a member of the group. The format of the Link Cost 35 from each eligible repeater. The cost is computed by corn- 
Table is <Group IDxGroup IDxlink cost>, where the bining the available capacity of the candidate repeater with 
Group ID's are given as AS numbers. 

The Load Table is a one dimensional table which identi- 
fies the current load at each repeater. Because repeaters may 
have different capacities, the load is a value that represents 40 
the ability of a given repeater to accept additional work. 
Each repeater sends its current load to a central master 
repeater at regular intervals, preferably at least approxi- 
mately once a minute. The master repeater broadcasts the 
Load Table to each reflector in the network, via the contact 45 
repeater. 

A reflector is provided entries in the Load Table only for 
repeaters which it is assigned to use. The assignment of 
repeaters to reflectors is performed centrally by a repeater 



the cost of the link between that repeater and the client. The 
link cost is computed by simply looking it up in the Link 
Cost table. 

The cost is determined using the following formula: 

thresholds • max-load 
capacity=max(max-load-currcnt-load, c) 
capacity=min(capacity, threshold) 
cost=link-cost " thrcshold/cacity 

In this formula, e is a very small number (epsilon) and K 
is a tuning factor initial set to 0.5. This formula causes the 



network operator at the master repeater. This assignment 50 cost 10 a given repeater to be increased, at a rate defined by 



makes it possible to modify the service level of a given 
reflector. For instance, a very active reflector may use many 
repeaters, whereas a relatively inactive reflector may use few 
repeaters. 

Tables may also be configured to provide selective 
repeater service to subscribers in other ways, e.g., for their 
clients in specific geographic regions, such as Europe or 

Asia — 

[^Measuring Load 

IiTthe presTntIy"preferred embodiments, repeater load is 
measurrxl-m-two.dimensigns , namely 



1. requests received by the repeater per time interval 
;(RRPT), and ic ~ 



-^Z. bytes sent by the repea te r per time interval (BSPT). f 
r For each - oflhese dimensions, a maxununTcapacity'set- i 
ting is set. The maximum capacity indicates the-point_at 
which the repeater is considered to be fully loaded[A higher^ 



K, if its capacity falls below a configurable threshold. 

Given the cost of each candidate repeater, the BRS selects 
all repeaters within a delta factor of the best score. From this 
set, the result is selected at random. 
55 The delta factor prevents the BRS from repeatedly select- 
ing a single repeater when scores are similar. It is generally 
required because available information about load and link 
costs-loses-accuracy-over-time.— This'factor is tunable. 
[Best Repeater Selector (BRS) ^ / 
tn rTne-BRS-operates.as.foliows, with r eference to FIG. 6: 
[Given a client network address /and the three tables 

described"abbve: ~ ^ 

El. Determine which group the client is in using the 
Group Reduction Table. 
65 E2. For each repeater in the Link Cost Table and Load 
Table, determine that repeater's combined cost as fol- 
lows: 



J 
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E2a. Determine the maximum and current load on the 

repeater (using the Load Table). 
E2b. Determine the link cost between the repeater and 

the client's group (using the Link Cost Table). 
E2c. Determine the combined cost as describedabove. 
Select a small set of repeaters with the /lowest co st. 

.E4..Select.a.random.member_from.the.set 

J Preferably the results of the BRS processing are main- J 
/ lair^^na' local cache at the reflector lOS.'Thus, if the best / 
repeater has recently been determined for a given client (i.e.,^o 
/for ainveh network address); that best repea'terca'abe reused/ 
quicldysiWithoulibeinctre^letermined! Since the calculation „ 7 
described above is based on statically, pre-computed tables,-; / 
if'the'labies have not change cj then there is no need to /*^ 
re^e'tTrrhine the best repealer^/ 15 
^Determining~th~e~Group Reduction and Link Cost Tables 
The Group Reduction Table and Link Cost Table used in 
BRS processing are created and regularly updated by an 
independent procedure referred to herein as NetMap. The 
NetMap procedure is run by executing several phases 20 

(de scribed below ) as need ed. 

The term Group is used'here to refers to an IP "address 

igroup". i f — ■ — 

— The'term Repeater Group refers to a Group that contains 
the IP address of a repeater. 25 

The term link cost refers to a statically determined cost for 
transmitting data between two Groups. In a presently pre- 
ferred implementation, this is the minimum of the sums of 
the costs of the links along each path between them. The link 
costs of primary concern here are link costs between a Group 30 
and a Repeater Group. 

The term relative link cost refers to the link cost relative 
to other link costs for the same Group which is calculated by 
subtracting the minimum link cost from a Group to any 
Repeater Group from each of its link costs to a Repeater 35 
Group. The term Cost Set refers to a set of Groups that are 
equivalent in regard to Best Repeater Selection. That is, 
given the information available, the same repeater would be 
selected for any of them. 

The NetMap procedure first processes input files to create 40 
an internal database called the Group Registry. These input 
files describe groups, the IP addresses within groups, and 
links between groups, and come a variety of sources, includ- 
ing publicly available Internet Routing Registry (IRR) 
databases, BGP router tables, and probe services that are 45 
located at various points around the Internet and use publicly 
available tools (such as "traceroute") to sample data paths. 
Once this processing is complete, the Group Registry con- 
tains essential information used for further processing, 
namely (1) the identity of each group, (2) the set of IP 50 
addresses in a given group, (3) the presence of links between 
groups indicating paths over which information may travel, 
and (4) the cost of sending data over a given link. 

The following processes are then performed on the Group 
Registry file. 55 
Calculate Repeater Group link costs 
The NetMap procedure calculates a "link cost" for trans- 
mission of data between each Repeater Group and each 
Group in the Group Registry. This overall link cost is defined 
as the minimum cost of any path between the two groups, 60 
where the cost of a path is equal to the sum of the costs of 
the individual links in the path. The link cost algorithm 
presented below is essentially the same as algorithm #562 
from ACM journal Transactions on Mathematical Software: 
"Shortest Path From a Specific Node to All Other Nodes in 65 
a Network" by U. Pape, ACM TOMS 6 (1980) pp. 450-455, 
http://www.netlib.org/toms/562. 



In this processing, the term Repeater Group refers to a 
Group that contains the IP address of a repeater. A group is 
a neighbor of another group if the Group Registry indicates 
that mere is a link between the two groups. 
For each target Repeater Group T: 
Initialize the link cost between T and itself to zero. 
Initialize the link cost between T and every other Group 
to infinity. 

Create a list L that will contain Groups that are equidistant 

from the target Repeater Group T. 
Initialize the list L to contain just the target Repeater 

Group T itself. 
While the list L is not empty: 

Create an empty list L' of neighbors of members of the 
list L. 

For each Group G in the list L: 
For each Group N that is a neighbor of G: 

Let cost refer to the sum of the link cost between 
T and G, and the link cost between G and N. 
The cost between T and G was determined in 
the previous pass of the algorithm; the link cost 
between G and N is from the Group Registry. 
If cost is less than the link cost between T and N: 
Set the link cost between T and N to cost. 
Add N to L' if it is not already on it. 
Set L to V. 
Calculate Cost Sets 

A Cost Set is a set of Groups that are equivalent with 
respect to Best Repeater Selection. That is, given the infor- 
mation available, the same repeater would be selected for 
any of them. 

The "cost profile" of a Group G is defined herein as the 
set of costs between G and each Repeater. Two cost profiles 
are said to be equivalent if the values in one profile differ 
from the corresponding values in the other profile by a 
constant amount. 

Once a client Group is known, the Best Repeater Selec- 
tion algorithm relies on the cost profile for information about 
the Group. If two cost profiles are equivalent, the BRS 
algorithm would select the same repeater given either pro- 
file. 

A Cost Set is then a set of groups that have equivalent cost 
profiles. 

The effectiveness of this method can be seen, for example, 
in the case where all paths to a Repeater from some Group 
A pass through some other Group B. The two Groups have 
equivalent cost profiles (and are therefore in the same Cost 
Set) since whatever Repeater is best for Group A is also 
going to be best for Group B, regardless of what path is 
taken between the two Groups. 

By normalizing cost profiles, equivalent cost profiles can 
be made identical. A normalized cost profile is a cost profile 
in which the minimum cost has the value zero. A normalized 
cost profile is computed by finding the minimum cost in the 
profile, and subtracting that value from each cost in the 
profile. 

Cost Sets are then computed using the following algo- 
rithm: 

For each Group G: 

Calculate the normalized cost profile for G 
Look for a Cost Set with the same normalized cost 
profile. 

If such as set is found, add G to the existing Cost Set; 
otherwise, create a new Cost Set with the calculated 
normalized cost profile, containing only G. 
The algorithm for finding Cost Sets employs a bash table 
to reduce the time necessary to determine whether the 
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desired Cost Set already exists. The hash table uses a hash 
value computed from cost profile of G. 

Each Cost Set is then numbered with a unique Cost Sent 
Index number. Cost Sets are then used in a straightforward 
manner to generate the Link Cost Table, which gives the cost 
from-eacb-Cost Set-to_each_Repeater._ 



_As_descnbed M b elow,.the J .Grp.up J , Reduclio n Table maps / 

eye ^Ifaaddress to one of these Cost Sets, f — ' 

-TEf^arijftMap- ~ f 



rThe IP 



25 



M ap_ is.a.sprted.lis l.p f -r (x ordS jw bich-map„lR^io 



^addr ess ranges 
pmap'is: 

<b&sc IP addrcssxmax [P addrcssxLink Cost Table kcy> 

where IP addresses are presently represented by 32-bit 1S 
integers. The entries are sorted by descending base address, 
and by ascending maximum address among equal base 
addresses, and by ascending Link Cost Table key among 
equal base addresses and maximum addresses. Note that 2Q 
ranges may overlap. 

The NetMap procedure generates an intermediate IP map 
containing a map between IP address ranges and Cost Set 
numbers as follows: 

For each Cost Set S: 
For each Group G in S: 
For each IP address range in G: 

Add a triple (low address, high address, Cost Set 
number of S) to the IP map. 

The IP map file is then sorted by descending base address, 30 
and by ascending maximum address among equal base 
addresses, and by ascending Cost Set number among equal 
base addresses and maximum addresses. The sort order for 
the base address and maximum address minimizes the time 
to build the Group Reduction Table and produces the proper 35 
results for overlapping entries. 

Finally, the NetMap procedure creates the Group Reduc- 
tion Table by processing the sorted IP map. The Group' 
Reduction Table maps IP addresses (specified by ranges) 
into Cost Set numbers. Special processing of the IP map file 40 
is required in order to detect overlapping address ranges, and 
to merge adjacent address ranges in order to minimize the 
size of the Group Reduction Table. 

An ordered list of address range segments is maintained, 
each segment consisting of a base address B and a Cost Set 45 
number N, sorted by base address B. (The maximum address 
of a segment is the base address of the next segment minus 
one.) 

The following algorithm is used: 

Initialize the list with the elements [-infinity, 50 
NOGROUP], [+infinity, NOGROUP]. 
For each entry in the IP map, in sorted order, consisting 
of (b, m, s), 

Insert (b, m, s) in the list (recall that IP map entries 
are of the form (low address, high address Cost Set 55 
number of S)) 
For each reserved LAN address range (b, m): Insert (b, 

m, LOCAL) in the list. 
For each Repeater at address a: 

Insert (a, a, REPEATER) in the list. 60 
For each segment S in the ordered list: 
Merge S with following segments with the same 
Cost Set 

Create a Group Reduction Table entry with base 
address from the base address of S, 65 
max address-next segment's base-1, 
group ID-Cost Set number of S. 



A reserved LAN address range is an address range 
reserved for use by LANs which should not appear as a 
global Internet address. LOCAL is a special Cost Set index 
different from all others, indicating that the range maps to a 
client which should never be reflected. REPEATER is a 
special Cost Set index different from all others, indicating 
that the address range maps to a repeater. NOGROUP is a 
special Cost Set index different from all others, indicating 
that this range of addresses has no known mapping. 

Given (B, M, N), insert an entry in the ordered address list 
as follows: 

Find the last segment (AB, AN) for which AB is less than 
or equal to B. 

If AB is less than B, insert a new segment (B, N) after 
(AB, AN). 

Find the last segment (YB, YN) for which YB is less than 

or equal to M. 
Replace by (XB, N) any segment (XB, NOGROUP) for 

which XB is greater than B and less than YB. 
If YN is not N, and either YN is NOGROUP or YB is less 

than or equal to B, 

Let (ZB, ZN) be the segment following (YB, YN). 
If M+l is less than ZB, insert a new segment (M+l, 

YN) before (ZB, ZN). 
Replace (YB, YN) by (YB, N). 
Rewriting HTML Resources 

As explained above with reference to FIG. 3 (B5), when 
a reflector or repeater serves a resource which itself includes 
resource identifiers (e.g., a HTML resource), that resource is 
modified (rewritten) to pre-reflect resource identifiers 
(URLs) of repeatable resources that appear in the resource. 
Rewriting ensures that when a browser requests repeatable 
resources identified by the requested resource, it gets them 
from a repeater without going back to the origin server, but 
when it requests non-repeatable resources identified by the 
requested resource, it will go directly to the origin server. 
Without this optimization, the browser would either make all 
requests at the origin server (increasing traffic at the origin 
server and necessitating far more redirections from the 
origin server), or it would make all requests at the repeater 
(causing the repeater to redundantly request and copy 
resources which could not be cached, increasing the over- 
head of serving such resources). 

Rewriting requires that a repeater has been selected (as 
described above with reference to the Best Repeater 
Selector). Rewriting uses a so-called BASE directive. The 
BASE directive lets the HTML identify a different base 
server. (The base address is normally the address of the 
HTML resource.) 

Rewriting is performed as follows: 
Fl. A BASE directive is added at the beginning of the 
HTML resource, or modified where necessary. 
Normally, a browser interprets relative URLs as being 
relative to the default base address, namely, the URL of 
the HTML resource (page) in which they are encoun- 
tered. The BASE address added specifies the resource 
at the reflector which originally served the resource. 
This means that unprocessed relative URLs (such as 
those generated by Javascript™ programs) will be 
interpreted as relative to the reflector. Without this 
BASE address, browsers would combine relative 
addresses with repeater names to create URLs which 
were not in the form required by repeaters (as described 
above in step Dl). 
F2. The rewriter identifies directives, such as embedded 
images and anchors, containing URLs. If the rewriter is 
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running in a reflector, it must parse the HTML file to Adding Subscribers to the Repeater Network 

identify these directives. If it is running in a repeater, When a new subscriber is added to the network, infor- 

the rewriter may have access to pre-computed infor- mation about the subscriber is entered in a Subscriber Table 

nation that identifies the location of each URL (placed at the master repeater and propagated to all repeaters in the 

in the HTML file in step F4). 5 network. This information includes the Committed Aggre- 

F3. For each URL encountered in the resource to be In f ormati ° n R. ate ( CAIR > for servers belonging to the 

re-written, the rewriter must determine whether the ^"\ e I' 1f f " f ,1 ,he h re P e h aters ,hat ma y be b y 

itdi • . li / . di d-i\ ic l uni • servers belonging to the subscriber. 

URL is reputable (as m steps B1-B2). If the URL is Md - R * & J loIS , Q ^ R Network 

not repeatable, it is not modrfed. On the other band, if Whcn e a new rcflcctor ^ adt £, t0 thc networfc sim , 

the URL is repeatable, it .s modified to refer to the io to aQd anoounccs jtself ^ a contac( 

selected repeater. preferably using a securely encrypted certificate including 

F4. After all URLs have been identified and modified, if the repeater's subscriber identifier. 

the resource is being served to a repeater, a table is The contact repeater determines whether the reflector 

appended at the beginning of the resource that identifies network address is permitted for this subscriber. If it is, the 

the location and content of each URL encountered in 15 contact repeater accepts the connection and updates the 

the resource. (This step is an optimization which elimi- reflector with all necessary tables (using version numbers to 

nates the need for parsing HTML resources at the determine which tables are out of date), 

repeater.) The re fl ector processes requests during this time, but is 

„, _ ... . , . . , . not "enabled" (allowed to reflect requests) until all of its 

F5. Once al changes have been identified, a new length „„ ,„ u ,„ .„ 

, „° , , , ' , & . 20 tables are current. 

* computed for Resource (page). The length is K ; Repeater Caches Synchronized 

inserted in the HTTP header poor to serving the Repeater cacnes are ^ m tne ^ , ha , wheQ a 

resource. „», T . , change to a resource is identified by a reflector, all repeater 

An extension of HTML, known as XML, is currently _.„u„ „ ^ . .u u • • i 

....... „ . . .„ ' caches are notified, and accept the change in a single 

being developed. The process of rewriting URLs will be transaction 

similar for XML, with some differences in the mechanism . r .„,,, ,.' ;j„„,, fi „ „ r „u „,i „, „ /„„j , ,u- 

' , . , .„ Only tne identifier ot the changed resource (and not the 

that parses the resource and identifies embedded URLs. en(ire resource) ig transmiUed t0 the r ^ , he identifler 

Handling Non-HTTP Protocols is used to effectively invalidate the corresponding cached 

Th>s invention makes it possible to reflect references to resource a( me f f ^ ^ fe faf fflore efficiem 

resources that .m served by protocols other than HTTP for 30 than broadcasting the contem of the ch d resource t0 

mstance, the File Transfer Protocol (FTP) and audio/video eacn reDeater 

stream protocols However, many protocols do not provide A ^ wU , load , he new , modified resQurce , he next 

the ability to redirect requests. It is, however, possible to i( ig requested 

redirect references before requests are actually made by A resource change fe identifled a[ the refleclor ejther 

rewriting URLs embedded in HTML pages. The following 3J manuall 5 the 0 or th h , ^ wnen files are 

modifications to the above algorithms are used to support 0D ^ Qr automaticallv through a change 

tnis capability. ..„,,. ., . detection mechanism (e.g., a separate process that checks 

In F4, the rewriter rewrites URLs for servers if those Iarl for charj g es ). 

servers appear in a configurable table of cooperating origin A fesource ch causes , he reflector ^ ^ afl 

server or so-called co-servers^ The reflector operator can w date „ me , 0 iB comac , which forwards (he 

define this table to include FTP serversand other servers. A m [Q , he fflas , er f f ^ invalidate m 

rewritten URL that refers to a non-HTTP resource takes the CQntains g lis( Qf resource idemiflers ( or regular express ; ons 

identifying patterns of resource identifiers) that have 

... ,, . , ■ • u i - ir-ii changed. (Regular expressions are used to invalidate a 

hUp7/<repeater>/-aongin servcr><®proxy-<schcme>[:<typc>J@/ ° . 

resource 45 directory or an entire server.) The repeater network uses a 

two-phase commit process to ensure that all repeaters cor- 

where <scheme> is a supported protocol name such as "ftp". rectly invalidate a given resource. 

This URL format is an alternative to the form shown in B3. The invalidation process operates as follows: 

In C3, the repeater looks for a protocol embedded in the The master broadcasts a "phase 1" invalidation request to 

arriving request. If a protocol is present and the requested 50 all repeaters indicating the resources and regular expressions 

resource is not already cached, the repeater uses the selected describing sets of resources to be invalidated, 

protocol instead of the default HTTP protocol to request the When each repeater receives the phase 1 message, it first 

resource when serving it and storing it in the cache. places the resource identifiers or regular expressions into a 

System Configuration and Management list of resource identifiers pending invalidation. 

In addition to the processing described above, the repeater 55 Any resource requested (in C3) that is in the pending 

network requires various mechanisms for system configu- invalidation list may not be served from the cache. This 

ration and network management. Some of these mechanisms prevents the cache from requesting the resource from a peer 

are described here. cache which may not have received an invalidation notice. 

Reflectors allow their operators to synchronize repeater Were it to request a resource in this manner, it might replace 

caches by performing publishing operations. The process of 60 the newly invalidated resource by the same, now stale, data, 

keeping repeater caches synchronized is described below. The repeater then compares the resource identifier of each 

Publishing indicates that a resource or collection of resource in its cache against the resource identifiers and 

resources has changed. regular expressions in the list. 

Repeaters and reflectors participate in various types of log Each match is invalidated by marking it stale and option- 
processing. The results of logs collected at repeaters are 65 ally removing it from the cache. This means that a future 
collected and merged with logs collected at reflectors, as request for the resource will cause it to retrieve a new copy 
described below. of the resource from the reflector. 
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When the repeater has completed the invalidation, it time, and other quite useful information. The extended 
returns an acknowledgment to the master. The master waits precision timestamp makes it possible to accurately merge 

until all repeaters have acknowledged the invalidation activity logs. 

request. Repeaters use the Network Time Protocol (NTP) to main- 

If a repeater fails to acknowledge within a given period, 5 tain synchronized clocks. Reflectors may either use NTP or 

it is disconnected from the master repeater. When it calculate a lime bias to provide roughly accurate timestamps 

reconnects, it will be told to flush its entire cache, which will relative to their contact repeater, 

eliminate any consistency problem. (To avoid flushing the Enforcing Committed Aggregate Information Rate 

entire cache, the master could keep a log of all invalidations The repeater network monitors and limits the aggregate 

performed, sorted by date, and flush only files invalidated 10 rate at which data is served on behalf of a given subscriber 

since the last time the reconnecting repeater successfully by all repeaters. This mechanism provides the following 

completed an invalidation. In the presently preferred benefits: 

embodiments this is not done since it is believed that 1. provides a means of pricing repeater service; 

repeaters will seldom disconnect.) _ „ . . c .. . 

r „„ „ . . , ' . , . ... . . 2. provides a means for estimating and reserving capacity 

When all repeaters have acknowledged invalidation (or 15 a( repeaters- 

timed out) the repeater broadcasts a "phase 2" invalidation ' 

request to all repeaters. This causes the repeaters to remove 3 - P rovldes a means fo r preventing clients of a busy site 

the corresponding resource identifiers and regular expres- from hmitlD S access 10 0,her SI,es - 

sions from the list of resource identifiers pending invalida- For each subscriber, a "threshold aggregate information 

,j on 20 rate" (TAIR) is configured and maintained at the master 

In another embodiment, the invalidation request will be repeater. This threshold is not necessarily the committed 

extended to allow a "server push". In such requests, after rale '. " mav a mulll P le of comlnil,cd rate ' based 015 a 

phase 2 of the invalidation process has completed, the pncing policy. 

repeater receiving the invalidation request will immediately c Each re P eater ineasu res the information rate component 

request a new copy of the invalidated resource to place in its 25 ° f each reflector for which u resources, periodically 

cacne (typically about once a minute), by recording the number of 

Logs and Log Processing bytes transmitted on behalf of that reflector each time a 

Web server activity logs are fundamental to monitoring re 9 uesl * delivered - The table thus created is sent to the 

the activity in a Web site. This invention creates "merged master re P eater once P er P** 10 *- ^ master re P eater com - 

logs" that combine the activity at reflectors with the activity 30 bmes lhe table ? from each repeater, summing the measured 

at repeaters, so that a single activity log appears at the origin ">f°™ation of each reflector over all repeaters that serve 

server showing all Web resource requests made on behalf of resources for that reflector, to determme the "measured 

that site at any repeater. aggregate information rate" (MAIR) for each reflector. 

This merged log can be processed by standard processing ' f ,he M ^ dR for a 8 iven reflector * Skater than the TAIR 

tools, as if it had been generated locally. 35 for that reflector, the MAIR is transmitted by the master to 

On a periodic basis, the master repeater (or its delegate) aU repeaters and to the respective reflector, 

collects logs from each repeater. The logs collected are . When a reflector receives a request, it determines whether 

merged, sorted by reflector identifier and timestamp, and most recently calculated MAIR is greater than its TAIR. 

stored in a dated file on a per-reflector basis. The merged log If « h "f 15 the case > ,h / reflector probabilistically decides 

for a given reflector represents the activity of all repeaters on 40 ^ heIhe ' to su PP re ss reflection, by serving the request locally 

behalf of that reflector. On a periodic basis, as configured by < m B2 )' ne probability of suppressing the reflection 

the reflector operator, a reflector contacts the master repeater """eases as an exponential function of the difference 

to request its merged logs. It downloads these and merges between MAIR and the CASR 

them with its locally maintained logs, sorting by timestamp. L S f """8 a . re< l uest locaUv dunn S a Peak period may strain 

The result is a merged log that represents all activity on 45 me local on 8 1D server ' but " Prevents this subscriber from 

behalf of repeaters and the given reflector. takm 8 more than a"°"ted bandwidth from the shared 

Activity logs are optionally extended with information repeater network, 

important to the repeater network, if the reflector is config- ,. When a repeater receives a request for a given subscriber 

ured to do so by the reflector operator. In particular, an < ln C2 )' " deler rmnes whether the subscnber is running near 

"extended status code" indicates information about each 50 »s threshold aggregate information rate. If this is the case, it 

request such as- probabilistically decides whether to reduce its load by 

, . , „ , „ redirecting the request back to the reflector. The probability 

1. request was served by a reflector locally; ° „ .. „ . , r . . ' 

^ ' ' increases exponentially as the reflector s aggregate lnforma- 

2. request was reflected to a repealer;* U0Q ra(e approaches il8 UmiL 

3. request was served by a reflector to a repeater,* js if a request is reflected back to a reflector, a special 

4. request for non-repeatable resource was served by character siring is attached to the resource identifier so that 
repeater;* the receiving reflector will not attempt to reflect it again. In 

5. request was served by a repeater from the cache; the current system, this string has the form 

6. request was served by a repeater after filling cache; 

, . "src=ovcrload . 

7. request pending invalidation was served by a repeater. 60 

(The activities marked with "*" represent intermediate states The reflector tests for this string in B2. 

of a request and do not normally appear in a final activity The mechanism for limiting Aggregate Information Rate 

log.) described above is fairly coarse. It limits at the level of 

In addition, activity logs contain a duration, and extended sessions with clients (since once a client has been reflected 

precision timestamps. The duration makes it possible to 65 to a given repeater, the rewriting process tends to keep the 

analyze the lime required to serve a resource, the bandwidth client coming back to that repeater) and, at best, individual 

used, the number of requests handled in parallel at a given requests for resources. A more fine-grained mechanism for 
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enforcing TA1R limits within repeaters operates by reducing A reflector is not permitted to reflect requests unless it is 

the bandwidth consumption of a busy subscriber when other connected to a repeater. The reflector relies on its contact 

subscribers are competing for bandwidth. repeater for critical information, such as load and Link Cost 

The fine-grained mechanism is a form of data "rate Tables, and current aggregate information rate. A reflector 

shaping". It extends the mechanism that copies resource data 5 that is not connected to a repeater can continue to receive 

to a connection when a reply is being sent to a client. When requests and handle them locally. 

an output channel is established at the time a request is If a reflector loses its connection with a repeater, due to a 

received, the repeater identifies which subscriber the chan- repeater failure or network outage, it continues to operate 

nel is operating for, in C2, and records the subscriber in a while it tries to connect to a repealer, 

data field associated with the channel. Each time a "write" 10 Each time a reflector attempts to connect to a repeater, it 

operation is about to be made to the channel, the Metered uses DNS to identify a set of candidate repeaters given a 

Output Stream first inspects the current values of the MAIR domain name that represents the repeater network. The 

and TAIR, calculated above, for the given subscriber. If the reflector tries each repeater in this set until it makes a 

MAIR is larger than the TAIR, then the mechanism pauses successful contact. Until a successful contact is made, the 

briefly before performing the write operation. The length of 15 reflector serves all requests locally. When a reflector con- 

the pause is proportional to the amount the MAIR exceeds nects to a repeater, the repeater can tell it to attempt to 

the TAIR. The pause ensures that tasks sending other contact a different repeater; this allows the repeater network 

resources to other clients, perhaps on behalf of other to ensure that no individual repeater has too many contacts, 

subscribers, will have an opportunity to send their data. When contact is made, the reflector provides the version 

Repeater Network Resilience 20 number of each of its tables to its contact repeater. The 

The repeater network is capable of recovering when a repeater then decides which tables should be updated and 

repeater or network connection fails. sends appropriate updates to the reflector. Once all tables 

A repeater cannot operate unless it is connected to the have been updated, the repeater notifies the reflector that it 

master repeater. The master repeater exchanges critical may now start reflecting requests, 

information with other repeaters, including information 25 Using a Proxy Cache within a Repeater 

about repeater load, aggregate information rate, subscribers, Repeaters are intentionally designed so that any proxy 

and link cost. cache can be used as a component within them. This is 

If a master fails, a "succession" process ensures that possible because the repeater receives HTTP requests and 

another repeater will take over the role of master, and the converts them to a form recognized by the proxy cache, 

network as a whole will remain operational. If a master fails, 30 On the other hand, several modifications to a standard 

or a connection to a master fails through a network problem, proxy cache have been or may be made as optimizations, 

any repeater attempting to communicate with the master will This includes, in particular, the ability to conveniently 

detect the failure, either through an indication from TCP/IP, invalidate a resource, the ability to support cache quotas, and 

or by timing out from a regular "heartbeat" message it sends the ability to avoid making an extra copy of each resource 

to the master. 35 as it passes from the proxy cache through the repeater to the 

When any repeater is disconnected from its master, it requester, 

immediately tries to reconnect to a series of potential In a preferred embodiment, a proxy cache is used to 

masters based on a configurable file called its "succession implement C3. The proxy cache is dedicated for use only by 

list". one or more repeaters. Each repeater requiring a resource 

The repeater tries each system on the list in succession 40 from the proxy cache constructs a proxy request from the 

until it successfully connects to a master. If in this process, inbound resource request. A normal HTTP GET request to a 

it comes to its own name, it takes on the role of master, and server contains only the pathname part of the URL — the 

accepts connections from other repeaters. If a repeater which scheme and server name are implicit. (In an HTTP GET 

is not at the top of the list becomes the master, it is called the request to a repeater, the pathname part of the URL includes 

"temporary master". 45 the name of the origin server on behalf of which the request 

A network partition may cause two groups of repeaters is being made, as described above.) However, a proxy agent 

each to elect a master. When the partition is corrected, it is GET request takes an entire URL. Therefore, the repeater 

necessary that the more senior master take over the network. must construct a proxy request containing the entire URL 

Therefore, when a repeater is temporary master, it regularly from the path portion of the URL it receives. Specifically, if 

tries to reconnect to any master above it in the succession 50 the incoming request takes the form: 
list. If it succeeds, it immediately disconnects from all of the 

repeaters connected to it. When they retry their succession GET/<origin scrver>/<path> 

lists, they will connect to the more senior master repeater. ... ,. , „ 

To prevent losses of data, a temporary master does not to &c re P catcr c ° nstructs » P™y «q««t of «* form: 

accept configuration changes and does not process log files. 55 get httpV7<origia s«vei>/<path» 
In order to take on these tasks, it must be informed that it is 

primary master by manual modification of its successor list. and if the incoming request takes the form: 
Each repeater regularly reloads its successor list to deter- 
mine whether it should change its idea of who the master is. GET<origin setver>@proiy-achem»:<typc>@fqiath> 

If a repeater is disconnected from the master, it must 60 . . „ 

resynchronize its cache when it reconnects to the master. ^ ^ repcater constructs a P ro *y *V>** of «he form: 

The master can maintain a list of recent cache invalidations GBr^chemo^/origin S erver>/< T au» 
and send to the repeater any invalidations it was not able to 

process while disconnected. If this list is not available for Cache Control 

some reason (for instance, because the reflector has been 65 HTTP replies contain directives called cache control 

disconnected too long), the reflector must invalidate its directives, which are used to indicate to a cache whether the 

entire cache. attached resource may be cached and if so, when it should 
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expire. A Web site administrator configures the Web site to 
attach appropriate directives. Often, the administrator will 
not know how long a page will be fresh, and must define a 
short expiration time to try to prevent caches from serving 
stale data. In many cases, a Web site operator will indicate 
a short expiration lime only in order to receive the requests 
(or hits) that would otherwise be masked by the presence of 
a cache. This is known in the industry as "cache-busting". 
Although some cache operators may consider cache-busting 
to be impolite, advertisers who rely on this information may 
consider it imperative. 

When a resource is stored in a repeater, its cache direc- 
tives can be ignored by the repeater, because the repeater 
receives explicit invalidation events to determine when a 
resource is stale. When a proxy cache is used as the cache 
at the repeater, the associated cache directives may be 
temporarily disabled. However, they must be re-enabled 
when the resource is served from the cache to a client, in 
order to permit the cache-control policy (including any 
cache-busting) to lake effect. 

The present invention contains mechanisms to prevent the 
proxy cache within a repealer from honoring cache control 
directives, while permitting ihe directives to be served from 
the repeater. 

When a reflector serves a resource to a repeater in B4, it 
replaces all cache directives by modified directives that are 
ignored by the repeater proxy cache. II does this by prefixing 
a distinctive string such as "wr-" to the beginning of the 
HTTP tag. Thus, "expires" becomes "wr-expires", and 
"cache-control" becomes "wr-cache-control". This prevents 
the proxy cache itself from honoring the directives. When a 
repeater serves a resource in C4, and the requesting client is 
not another repeater, it searches for HTTP tags beginning 
with "wr-" and removes the "wr-". This allows the clients 
retrieving the resource to honor the directives. 

Resource Revalidation 

There are several cases where a resource may be cached 
so long as the origin server is consulted each time it is 
served. In one case, the request for the resource is attached 
to a so-called "cookie". The origin server must be presented 
with the cookie to record the request and determine whether 
the cached resource may be served or not. In anolher case, 
the request for the resource is attached to an authentication 
header (which identifies the requester with a user id and 
password). Each new request for the resource must be tested 
at the origin server to assure that the requester is authorized 
to access the resource. 

The HTTP 1.1 specification defines a reply header titled 
"Must-Revalidate" which allows an origin server to instruct 
a proxy cache to "revalidate" a resource each time a request 
is received. Normally, this mechanism is used to determine 
whether a resource is still fresh. In the present invention, 
Must-Revalidate makes it possible to ask an origin server to 
validate a request that is otherwise served from a repeater. 

The reflector rule base contains information that deter- 
mines which resources may be repeated but must be revali- 
dated each time they are served. For each such resource, in 
B4, the reflector attaches a Must-Revalidate header. Each 
time a request comes to a repeater for a cached resource 
marked with a Must-Revalidate header, the request is for- 
warded to the reflector for validation prior to serving the 
requested resource. 

Cache Quotas 

The cache component of a repeater is shared among those 
subscribers that reflect clients to that repeater. In order to 
allow subscribers fair access to storage facilities, the cache 
may be extended to support quotas. 
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Normally, a proxy cache may be configured with a disk 
space threshold T. Whenever more than T bytes are stored in 
the cache, the cache attempts to find resources to eliminate. 
Typically a cache uses the least-recently-used (LRU) 
5 algorithm to determine which resources to eliminate; more 
sophisticated caches use other algorithms. A cache may also 
support several threshold values-for instance, a lower 
threshold which, when reached, causes a low priority back- 
ground process to remove items from the cache, and a higher 
10 threshold which, when reached, prevents resources from 
being cached until sufficient free disk space has been 
reclaimed. 

If two subscribers A and B share a cache, and more 
resources of subscriber A are accessed during a period of 

15 time than resources of subscriber B, then fewer of B's 
resources will be in the cache when new requests arrive. It 
is possible that, due to the behavior of A, B's resources will 
never be cached when they are requested. In the present 
invention, this behavior is undesirable. To address this issue, 

20 the invention extends the cache at a repeater to support cache 
quotas. 

The cache records the amount of space used by each 
subscriber in D s , and supports a configurable threshold T s 
for each subscriber. 

25 Whenever a resource is added to the cache (at C3), the 
value D s is updated for the subscriber providing the 
resource. If D s is larger than T s , the cache attempts to find 
resources to eliminate, from among those resources associ- 
ated with subscriber S. The cache is effectively partitioned 

30 into separate areas for each subscriber. 

The original threshold T is still supported. If the sum of 
reserved segments for each subscriber is smaller than the 
total space reserved in the cache, the remaining area is 
"common" and subject to competition among subscribers. 

35 Note, this mechanism might be implemented by modify- 
ing the existing proxy cache discussed above, or it might 
also be implemented without modifying the proxy cache — if 
the proxy cache at least makes it possible for an external 
program to obtain a list of resources in the cache, and to 

40 remove a given resource from the cache. 
Rewriting from Repeaters 

When a repeater receives a request for a resource, its 
proxy cache may be configured to determine whether a peer 
cache contains the requested resource. If so, the proxy cache 

45 obtains the resource from the peer cache, which can be faster 
than obtaining it from the origin server (the reflector). 
However, a consequence of this is that rewritten HTML 
resources retrieved from the peer cache would identify the 
wrong repeater. Thus, to allow for cooperating proxy caches, 

so resources are preferably rewritten at the repeater. 

When a resource is rewritten for a repeater, a special tag 
is placed at the beginning of the resource. When construct- 
ing a reply, the repeater inspects the tag to determine 
whether the resource indicates that additional rewriting is 

55 necessary. If so, the repeater modifies the resource by 
replacing references to the old repeater with references to the 
new repeater. 

It is only necessary to perform this rewriting when a 
resource is served to the proxy cache at another repeater. 

60 Repeater-Side Include 

Sometimes, an origin server constructs a custom resource 
for each request (for instance, when inserting an advertise- 
ment based on the history of the requesting client). In such 
a case, that resource must be served locally rather than 

65 repeated. Generally, a custom resource contains, along with 
the custom information, text and references to other, 
repeatable, resources. 
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The process that assembles a "page" from a text resource 
and possibly one or more image resources is performed by 
the Web browser, directed by HTML. However, it is not 
possible using HTML to cause a browser to assemble a page 
using text or directives from a separate resource. Therefore, 5 
custom resources often necessarily contain large amounts of 
static text that would otherwise be repeatable. 

To resolve this potential inefficiency, repeaters recognize 
a special directive called a "repeater side include". This 
directive makes it possible for the repeater to assemble a 
custom resource, using a combination of repeatable and 
local resources. In this way, the static text can be made 
repeatable, and only the special directive need be served 
locally by the reflector. 

For example, a resource X might consist of custom 
directives selecting an advertising banner, followed by a 15 
large text article. To make this resource repeatable, the Web 
site administrator must break out a second resource, Y, to 
select the banner. Resource X is modified to contain a 
repeater-side include directive identifying resource Y, along 
with the article. Resource Y is created and contains only the 20 
custom directives selecting an ad banner. Now resource X is 
repeatable, and only resource Y, which is relatively small, is 
not repeatable. 

When a repeater constructs a reply, it determines whether 
the resource being served is an HTML resource, and if so, 2 s 
scans it for repeater-side include directives. Each such 
directive includes a URL, which the repeater resolves and 
substitutes in place of the directive. The entire resource must 
be assembled before it is served, in order to determine its 
final size, as the size is included in a reply header ahead of 
the resource. 

Thus, a method and apparatus for dynamically replicating 
selected resources in computer networks is provided. One 
skilled in the art will appreciate that the present invention 
can be practiced by other than the described embodiments, 3J 
which are presented for purposes of illustration and not 
limitation, and the present invention is limited only by the 
claims that follow. 

What is claimed: 

1. A method of processing resource requests in a computer 4Q 
network, the method comprising, 

(i) by a client: 

(A) making a request for a particular resource from an 
origin server, the request including a resource iden- 
tifier for the particular resource; 

(ii) by a reflector: 

(B) intercepting the request from the client to the origin 
server; 

(C) selecting a repeater to process the request, wherein 
the repeater is selected based on a predicted cost or 50 
speed of transmission between the repeater and the 
client; 

(D) providing to the client a modified resource identi- 
fier designating the repeater; 

(iii) by the client: 55 

(E) receiving the modified resource identifier from the 
reflector; and 

(F) making a request for the particular resource from 
the repeater designated in the modified resource 
identifier, 60 

(iv) by the repeater: 

(G) receiving the request from the client; and 

(H) returning the requested resource to the client. 

2. A method as in claim 1 further comprising, by the 
repeater: 65 

(I) making a request for the resource from the origin 
server; and 
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(I) receiving the resource from the origin server. 

3. A method as in claim I wherein the selecting of a 
repeater by the reflector comprises: 

(CI) partitioning the network into groups; 

(C2) determining which group the client is in; 

(C3) selecting, from a plurality of repeaters in the 
network, a set of repeaters having a lowest cost relative 
to the group which the client is in; and 

(C4) selecting as the repealer a member of the selected set 
of repeaters. 

4. A method as in claim 3, wherein the cost of a repeater 
is a value based on that repeater's current load and a 
maximum load for that repeater. 

5. A method as in claim 3, wherein the cost of a repeater 
is a value based on a predicted cost or speed of transmission 
between the repeater and a client in the group. 

6. A method as in claim 1 wherein the particular resource 
itself contains at least one other resource identifier of at least 
one other resource, the method further comprising: 

rewriting the particular resource to replace at least some 
of the resource identifiers contained therein with modi- 
fied resource identifiers designating a repeater instead 
of the origin server. 

7. A method as in claim 6 wherein the rewriting is 
performed by one of the repeater, the reflector or another 
repeater. 

8. A method of processing resource requests in a computer 
network, the method comprising, 

(i) by a client: 

(A) making a request for a particular resource from an 
origin server, the request including a resource iden- 
tifier for the particular resource; 

(ii) by a reflector: 

(B) intercepting the request from the client to the origin 
server; 

(C) determining whether to reflect the request to a 
repeater; 

(D) when the reflector determines not to reflect the 
request, forwarding the request to the origin server, 
otherwise 

(Dl) selecting a repeater to process the request, 
wherein the repeater is selected based on a pre- 
dicted cost or speed of transmission between the 
repeater and the client; 

(D2) providing to the client a modified resource 
identifier designating the repeater. 

9. A method as in claim 8, further comprising, when the 
reflector determines to reflect the request, 

(iii) by the client: 

(E) receiving the modified resource identifier from the 
reflector; and 

(F) making a request for the particular resource from 
the repeater designated in the modified resource 
identifier; 

(iv) by the repeater: 

(G) receiving the request from the client; and 

(H) returning the requested resource to the client. 

10. A method as in claim 8 wherein the reflector deter- 
mines whether to reflect a request by comparing the resource 
identifier with regular expression patterns of repeatable 
resources. 

11. A method as in claim 8, wherein the reflector has a 
threshold aggregate information rate (TAIR) associated 
therewith, and wherein the determining of whether to reflect 
the request to a repeater comprises: 

determining whether the TAIR of the reflector is exceeded 
by a measured aggregate information rate (MAIR) for 
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the reflector, wherein the reflector determines not to 
reflect the request when the MAIR exceeds the TA1R 
for the reflector. 

12. A method as in claim 8, wherein the reflector has a 
threshold aggregate information rate (TAIR) associated 5 
therewith, and wherein the determining of whether to reflect 
the request to a repeater comprises: 

probabilistically determining whether the TAIR of the 
reflector is exceeded by a measured aggregate infor- 
mation rate (MAIR) for the reflector, wherein the J0 
reflector determines not to reflect the request as an 
exponential function of the difference between the 
MAIR and the TAIR. 

13. A method as in any of claims 11-12, wherein the 
MAIR is obtained from repeaters according to the rate at 
which they have transmitted data on behalf of the reflector 15 
during a given time interval. 

14. A method as in any one of claims 1-12 wherein the 
network is the Internet and wherein the resource identifier is 
a uniform resource locator (URL) for designating resources 
on the Internet, and wherein the modified resource identifier 20 
is a URL designating the repeater and indicating the reflector 

or origin server, and wherein the modified resource identifier 
is provided to the client using a REDIRECT message. 

15. In a computer network wherein clients request 
resources from origin servers, a method comprising: 25 

providing at least one repeater; 

providing reflectors at some of the origin servers, each 
reflector intercepting client resource requests made to 
its respective origin server; and 

each reflector selectively redirecting client resource 30 
requests for certain resources to one of the repeaters, 
wherein a reflector determines whether or not to redi- 
rect a client resource to a repeater based on a predicted 
cost or speed of transmission between the repeater and 
the client making the resource request. 35 

16. A method as in claim 15 further comprising, by 
repeaters in the network: 

servicing redirected client resource requests; and 
selectively maintaining copies of requested resources, 
whereby resources corresponding to redirected resource 40 

requests are selectively migrated from their origin 

servers to one or more repeaters. 

17. A computer network comprising: 

a plurality of origin servers, at least some of the origin 

servers having reflectors associated therewith; 
a plurality of repeaters; and 
a plurality of clients, 

wherein each reflector is adapted to intercept resource 
requests made to its respective origin server and to 
selectively redirect the resource requests to a dynami- 50 
cally selected repeater, wherein a repeater is selected 
based on a predicted cost or speed of transmission 
between the repeater and the client making the resource 
request. 

18. In a computer network wherein clients request 55 
resources from origin servers, a reflector mechanism asso- 
ciated with an origin server, the reflector mechanism com- 
prising: 

means for intercepting a resource request made by client 
of an origin server; $0 

means for analyzing the resource request to determine 
whether to service the request locally at the origin 
server, 

means for determining a best repeater in the network to 
service the request when the analyzing means deter- 65 
mines that the request should not be serviced locally; 
and 
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means for redirecting the client to the best repeater, 
wherein a repeater is selected based on a predicted cost 
or speed of transmission between the repeater and the 
client making the resource request. 

19. A reflector mechanism as in claim 18 wherein the 
network is partitioned into groups and the means for deter- 
mining the best repeater comprises: 

means for determining which group the client is in; 
means for selecting, from a plurality of repeaters in the 

network, a set of repeaters having a lowest cost relative 

to the group the client is in; and 
means for selecting as the best repeater a member of the 

set of repeaters. 

20. A reflector mechanism as in claim 19, wherein the cost 
of a repeater is a value based on a predicted cost or speed of 
transmission between the repeater and a client in the group. 

21. A mechanism as in claim 19, wherein the cost of a 
repeater is a value based on that repeaters current load and 
a maximum load for that repeater. 

22. A reflector as in claim 18 wherein the resource itself 
contains resource identifiers, the repeater further compris- 
ing: 

means for rewriting the resource to replace at least some 
of the resource identifiers contained therein with modi- 
fied resource identifiers designating the repeater instead 
of the origin server. 

23. A reflector as in claim 18 wherein the resource itself 
contains resource identifiers, the reflector further compris- 
ing: 

means for rewriting the resource to replace at least some 
of the resource identifiers contained therein with modi- 
fied resource identifiers designating the best repeater 
instead of the origin server. 

24. In a computer network wherein clients request 
resources from origin servers, a repeater mechanism com- 
prising: 

means for receiving a resource request from a client; 
means for determining whether the resource is available 
locally; 

means for, when it is determined that the resource is not 
available locally, obtaining the resource from an origin 
server, wherein the origin server is selected based on a 
predicted cost or speed of transmission between the 
origin server and the client; and 

means for providing the resource to the client. 

25. A method of processing resource requests in a com- 
puter network, the method comprising, 

by a client, making a request for a particular resource 
from an origin server, the request including a resource 
identifier for the particular resource, and wherein the 
particular resource itself contains at least one other 
resource identifier of at least one other resource; 

a reflector intercepting the request from the client to the 
origin server; 

selecting a repeater to process the request, wherein the 
repeater is selected based on a predicted cost or speed 
of transmission between the repeater and the client; 

rewriting the particular resource to replace at least some 
of the resource identifiers contained therein with modi- 
fied resource identifiers designating a repeater instead 
of the origin server; 

providing to the client a modified resource identifier 
designating the repeater; 
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the client receiving the modified resource identifier from 
the reflector; and 

making a request for the particular resource from the 
repeater designated in the modified resource identifier; 

the repealer receiving the request from the client; and 

returning the requested resource to the client. 

26. A method as in claim 25 wherein the rewriting is 
performed by one of the repeater, the reflector or another 
repeater. 
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27. A method as in any one of claims 25-26 wherein the 
network is the Internet and wherein the resource identifier is 
a uniform resource locator (URL) for designating resources 
on the Internet, and wherein the modified resource identifier 
is a URL designating the repeater and indicating the reflector 
or origin server, and wherein the modified resource identifier 
is provided to the client using a REDIRECT message. 
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METHOD AND APPARATUS FOR 
DISPATCHING DOCUMENT REQUESTS IN A 
PROXY 

CROSS-REFERENCE TO RELATED 
APPLICATIONS 
The present application is a divisional of U.S. patent 
application entitled "Method and Apparatus for Providing 
Remote Site Administrators with User Hits on Mirrored Web 
Sites," having application Ser. No. 08/827,643, and filed on 
Apr. 9, 1997, now U.S. Pat. No. 5,935,207 which is a 
continuation-in-part of U.S. patent application entitled, 
"Method and Apparatus for Providing Proxying and 
Transcoding of Documents in a Distributed Network," hav- 
ing application Ser. No. 08/656,924, and filed on Jun. 3, 
1996 now U.S. Pat. No. 5,918,013. The foregoing patents 
and patent applications are incorporated herein by reference. 

BACKGROUND OF THE INVENTION 

1. The Field of the Invention 

The present invention relates generally to the field of 
client-server computer networking. More specifically, the 
invention relates to a method and apparatus for dispatching 
document requests in a proxy. 

2. The Prior State of the Art 

World Wide Web (Web) documents are commonly written 
in Hypertext Mark-up Language (HTML). HTML docu- 
ments typically reside on Web servers and are requested by 
Web clients. Often, delays can be introduced during Web 
browsing, for example, by heavy communications traffic on 
the Internet or by a slow response of a remote site. Providing 
one or more servers for mirroring Web sites located on 
remote servers is one means of reducing delays involved 
with browsing the Web. These mirroring servers, typically 
referred to collectively as a "proxy" or individually as 
"proxy servers," store frequently accessed Web sites in a 
local cache, thereby eliminating recurrent retrievals of com- 
monly accessed documents. Thus, when a request for a 
particular Web page is received from a client, the proxy 
server associated with the particular client looks first to its 
local cache to service the request rather than the remote site 
upon which the Web page resides. If the requested document 
is found locally, the request can be serviced by the proxy 
server and a subsequent request to the remote server for the 
document can be avoided. Therefore, only when a valid copy 
of the requested document is not in the proxy's local cache 
would the remote server need to be accessed. In this manner, 
exposure to heavy communications traffic on the Internet 
and slow responses of remote serves can be reduced. 

While this mirroring approach is beneficial to end-users, 
the proxy's cache space is inefficiently allocated in current 
mirroring technology. Currently, each client is assigned to 
one or more proxy servers. Therefore, the documents most 
recently requested by each active client will reside in the 
corresponding proxy server's cache. Assuming one or more 
clients assigned to different proxy servers have requested the 
same document recently, the same document might be 
cached in several of the proxy servers, thereby reducing the 
cache storage space for other frequently requested docu- 
ments. Further, one or more extremely popular documents 
might potentially be cached in each proxy server. While 
redundancy of information is useful for fault tolerance, 
organized redundancy would be preferable. Given the 
foregoing, what is needed is a means of more efficiently 
allocating cache space within a proxy. Specifically, it would 
be desirable to allocate mutually exclusive portions of the 
Web's content to particular proxy servers. 
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SUMMARY AND OBJECTS OF THE 
INVENTION 

A method is described for dispatching document requests 

5 in a proxy to more efficiently allocate the document cache 
space within the proxy. A proxy includes a document cache 
storing recently requested documents. The proxy is coupled 
to a client and to a remote server. The proxy implements a 
dispatching scheme for client requests that results in a more 

10 efficient allocation of the proxy's document cache space. 
The proxy receives a document request from a client. A 
Uniform Resource Locator (URL) is included in the docu- 
ment request. The proxy forwards the request to one of a 
plurality of proxy servers based upon the URL. 

15 According to another aspect of the present invention, the 
proxy performs a hash function on the URL that maps the 
URL to exactly one of the plurality of proxy servers. 
Advantageously, in this manner, mutually exclusive portions 
of the Web's content can be allocated to particular proxy 

20 servers. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The present invention is illustrated by way of example, 
and not by way of limitation, in the figures of the accom- 
25 panying drawings and in which like reference numerals refer 
to similar elements and in which: 

FIG. 1 is a block diagram illustrating several clients 
connected to a proxy server in a network. 
3 Q FIG. 2 is a diagram illustrating a client according to one 
embodiment of the present invention. 

FIG. 3 is a block diagram of a server according to one 
embodiment of the present invention. 

FIG. 4 is a data flow diagram illustrating the interaction 
35 of proxy components according to one embodiment of the 
present invention. 

FIG. 5A is a depiction of an exemplary site tracking list 
according to one embodiment of the present invention. 

FIG. SB is a depiction of an exemplary per site hit 
database according to one embodiment of the present inven- 
tion. 

FIG. 6 is a logical view of an exemplary directory 
structure of a remote server. 
45 FIG. 7 is a flow diagram illustrating a method of per- 
forming hit accumulation according to one embodiment of 
the present invention. 

FIG. 8 is a flow diagram illustrating a method of hit 
reporting according to one embodiment of the present inven- 
50 tion. 

FIG. 9 is a data flow diagram illustrating the interaction 
of proxy components according to another embodiment of 
the present invention. 
J5 FIG. 10 is a flow diagram illustrating a method of 
dispatching requests to segregate the storage of documents 
according to one embodiment of the present invention. 

DETAILED DESCRIPTION OF THE 
PREFERRED EMBODIMENTS 

60 

A method and apparatus are described for maintaining a 
more efficient document caching scheme in a client-server 
computer network. In the following description, for pur- 
poses of explanation, numerous specific details are set forth 
65 in order to provide a thorough understanding of the present 
invention. It will be evident, however, to one skilled in the 
art that the present invention may be practiced without these 
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specific details. Further, in other instances, well-known television set 12 as an integral unit. The WebTV™ box 10 
structures and devices are shown in block diagram. includes hardware and software for providing the user with 
The present invention includes various steps, which will a graphical user interface, by which the user can access the 
be described below. The steps can be embodied in machine- WebTV™ network services, browse the Web, send e-mail, 
executable instructions, which can be used to cause a 5 and otherwise access the Internet. The WebTV™ client 1 
general-purpose or special-purpose processor programmed uses 'he television set 12 as a display device. The WebTV™ 
with the instructions to perform the steps. Alternatively, the box 10 >s coupled to the television set 12 by a video link 6. 
steps of the present invention might be performed by spe- The video link 6 is an RF (radio frequency), S-video, 
cific hardware components that contain hardwired logic for composite video, or other equivalent form of video link. In 
performing the steps, or by any combination of programmed 10 the preferred embodiment, the client 1 includes both a 
computer components and custom hardware components. standard modem and an ISDN modem, such that the corn- 
While embodiments of the present invention will be munication link 29 between the WebTV™ box 10 and the 
described with respect to HTMLdocuments, the method and server 5 can be either a tele P none (POTS) connection B a or 
apparatus described herein are equally applicable to other an ISDN conn ection 29b. The WebTV™ box 10 receives 
types of documents such as text files, images (e.g., JPEG and 15 P ower trough a P° we r line 7. 

GIF), audio files (e.g., .WAV, AU, and AIFF), video files Remote control 11 is operated by the user in order to 

(e.g., .MOV, and AVI), and other document types commonly control the WebTV™ client 1 in browsing the Web, sending 

found on the Web. e-mail, and performing other Internet-related functions. The 

WebTV™ box 10 receives commands from remote control 

Sytem Overview 20 n v ; a an infrared (IR) communication link. In alternative 

The present invention may be included in a system, embodiments, the link between the remote control 11 and the 

known as WebTV™, for providing a user with access to the WebTV™ box 10 may be RF or any equivalent mode of 

Internet. A user of a WebTV™ client generally accesses a transmission. 

WebTV™ server via a direct-dial telephone (POTS, for „„ a c i o c . 

„,-,.,. . „\ jo- 25 An Exemplary Server System 
plain old telephone service ), ISDN (Integrated Services 

Digital Network), or other similar connection, in order to The WebTV™ server 5 generally includes one or more 

browse the Web, send and receive electronic mail (e-mail), computer systems generally having the architecture illus- 

and use various other WebTV™ network services. The trated in FIG. 3. It should be noted that the illustrated 

WebTV™ network services are provided by WebTV™ 30 architecture is only exemplary; the present invention is not 

servers using software residing within the WebTV™ servers constrained to this particular architecture. The illustrated 

in conjunction with software residing within a WebTV™ architecture includes a central processing unit (CPU) 50, 

client. random access memory (RAM) 51, read-only memory 

FIG. 1 illustrates a basic configuration of the WebTV™ (ROM) 52, a mass storage device 53, a modern 54, a network 

network according to one embodiment. A number of 35 interface card (NIC) 55, and various other input/output (I/O) 

WebTV™ clients 1 are coupled to a modem pool 2 via devices 56. Mass storage device 53 includes a magnetic, 

direct-dial, bi-directional data connections 29, which may be optical, or other equivalent storage medium. I/O devices 56 

telephone (POTS, i.e., "plain old telephone service"), ISDN ma y include any or all of devices such as a display monitor, 

(Integrated Services Digital Network), or any other similar keyboard, cursor control device, etc. Modem 54 is used to 

type of connection. The modem pool 2 is coupled typically 40 communicate data to and from remote servers 4 via the 

through a router, such as that conventionally known in the Internet. 

art, to a number of remote servers 4 via a conventional As noted above, the WebTV™ server 5 may actually 

network infrastructure 3, such as the Internet. The WebTV™ comprise multiple physical and logical devices connected in 

system also includes a WebTV™ server 5, which specifi- a distributed architecture. Accordingly, NIC 55 is used to 

cally supports the WebTV™ clients 1. The WebTV™ clients 45 provide data communication with other devices that are part 

1 each have a connection to the WebTV™ server 5 either of the WebTV™ services. Modem 54 may also be used to 

directly or through the modem pool 2 and the Internet 3. communicate with other devices that are part of the 

Note that the modem pool 2 is a conventional modem pool, WebTV™ services and which are not located in close 

such as those found today throughout the world providing geographic proximity to the illustrated device, 

access to the Internet and private networks. 50 

Note that in this description, in order to facilitate Exemplary Proxy 

explanation, the WebTV™ server 5 is generally discussed as FIG. 4 illustrates the caching and hit accumulation fea- 

if it were a single device, and functions provided by the tures of the WebTV™ proxy 40p.according-to-one.embodi- 

WebTV™ services are generally discussed as being per- ment ofthe present.invention.jn tMs.ernbpdirnent, one oJ 

formed by such single device. However, the WebTV™ 5 5 molTwebTV™^ servers 5 .aSf^ffT^SlS^ irf 

server 5 may actually comprise multiple physical and logical provlSirig^lfe Web'TV'"' — — ... — 

devices connected in a distributed architecture, and the otfter^WeBTV™* services* 



'-■•■nrm' ' ' n -im-TV '^s 4^>!io?e''' spe?ihcallv7'We6TV T> y 
various functions discussed below which are provided by the se "er^funSuons^ a' i! cac'huig prpxyT" Tn Vh'is. example^? 
WebTV™ services may actually be distributed among mul- fpToxy^O^inciudes a proxy server 40*5abd a h Yi accumulator / 
tiple WebTV™ server devices. 60^rve¥3i'S^bieTit"requests that are semced7fonTthe proxy 

, _ , _. _ servefs-Iocal document cache 465 are communicated to the 

An Exemplary diem System hft accumulator 415 ^ ^ be described 5eIoWj the 

FIG. 2 illustrates a WebTV™ client 1. The WebTV™ hit accumulator server 415 maintains and organizes the data 

client 1 includes an electronics unit 10 (hereinafter referred so as to provide hit tracking information to remote site 

to as "the WebTV™ box 10"), an ordinary television set 12, 65 administrators such as remote site administrator 480. 

and a remote control 11. In an alternative embodiment of the Remote site administrator 480 may include entities such as 

present invention, the WebTV™ box 10 is built into the persons authorized to gather statistical data for the remote 
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site, persons authorized to manage and maintain the remote 
site, the remote site itself, or an automated computer system 
or other device configured to receive statistical dataTor.Jbe^ 
remote site. 

In this embodiment, the proxy scryer_405_include^a proxy- 
request -processor 410, a_ docum cnt cache 465 ^a doc ument^ 
^'data base 461^fand a transcoder^ 466rTHe _ prbxy request 
processor 410 receives requests from the WebTV™ client 1 
and sends responses to the WebTV™ client 1. The proxy 
request processor 410 maintains the document database 461, 
the document cache 465, and further determines when 
\t£aflsco.ding„w^ll be performed. The document cache 465 is 
usea~f6r~temporary storage of Web documents such as 
images, text files; audio files, video files and other informa- 
tion which is used frequently by either WebTV™ client 1 or 
the proxy server 405. / 

When a document request is received from a client, the 
proxy request processor 410 determines whether to service 
the request from the document cache 465 by performing a 
search of the document cache 465. If the document is found 
locally, then the document may be retrieved from the docu- 
ment cache 465 and transferred to the client with the 
response. However, if the requested document is not found, 
then the proxy request processor 410 requests the document 
from the appropriate site and upon receipt the proxy request 
processor 410 provides the document to the client with the 
response. Further, the proxy request processor 410 antici- 
pates subsequent requests by storing the document in the 
document cache 465. 

When a document is retrieved by the proxy server 405 
from a remote server 4, for example, detailed information on 
this document may be stored in the document database 461. 
The stored information may subsequently be used by the 
proxy server 405 to speed up processing and downloading of 
that document in response to future requests for that docu- 
ment. In addition, the transcoding functions and various 
other functions of the WebTV™ service may be facilitated 
by making use of information stored in the document 
database 461. For example, the document database 461 may 
include certain historical and diagnostic information for 
Web pages that have been accessed by a WebTV™ client 1. 

Document transcoder 466 is used to automatically revise 
the code of Web documents retrieved from the remote 
servers 4, for purposes such as: (1) correcting bugs in 
documents; (2) correcting undesirable effects which occur 
when a document is displayed by the client 1; (3) improving 
the efficiency of transmission of documents from the server 
5 to the client 1; (4) matching hardware decompression 
technology within the client 1; (5) resizing images to fit on 
the television set 12; (6) converting documents into other 
formats to provide compatibility; (7) reducing latency expe- 
rienced by a client 1 when displaying a Web page with 
in-line images (images displayed in text); and (8) altering 
documents to fit into smaller memory spaces. 

In one embodiment, bit accumulator server 415 may act 
as a Web server providing a Hypertext Transport Protocol 
(HTTP) interface by which remote site administrators can 
access the accumulated hits for their sites by way of a Web 
browser. The hit accumulator server 415 may include a hit 
log 420, a hit accumulator processor 430, a site tracking list 
425, a hit report processor 450, and a per site bit database 
440. One method of communicating hits from a given proxy 
server to the hit accumulator server 415 is through a com- 
mon storage device such as hit log 420. This and other 
methods of communicating hits will be described below. 
Regardless of how bits are communicated to the bit' accu- 
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mulator server 415, a processor such as the hit accumulator 
processor 430 is desirable to verify the bits against a list of 
locations that are to be monitored. Such a list of locations 
may be stored in the site tracking list 425, for example. A 

S location, in this context, refers to the location of a document. 
The location may be represented by a URL, a directory path, 
or other mechanisms for uniquely identifying a particular 
document. Hits that are validated by the bit accumulator 
processor 430 are recorded in the per site hit database 440. 
Thus, the per site hit database 440 will have a current count 
of the hits for each location listed in the site tracking list 425. 
In this embodiment, the hit report processor 450 may receive 
requests from remote site administrators such as remote site 
administrator 480 for hit reports. The hit reports can be 
extracted from the per site hit database 440 and transmitted 

15 to the requester in an HTML report, for example. 

While in this embodiment the proxy server 405 and the hit 
accumulation server 415 have been shown as separate 
servers, the functionality of both could be combined into one 
WebTV™ server 5. Additionally, the proxy 400 might be 

20 expanded to include more than one proxy server 405. When 
expanding the proxy 400 to include more than one proxy 
server 405, only one hit accumulation server 415 need be 
employed. 

In alternative embodiments, hits may be communicated 
25 by a proxy server 405 to the accumulation server 415 by way 
of a network connection such as permanent connection 
through which events may be sent. Also, message passing 
may be employed whereby the proxy server 405 sends a 
message such as a datagram to the hit accumulator 415 to 
30 notify it of a document cache hit. It is appreciated that many 
other means of communicating information between servers 
are possible. 

An Exemplary Site .Tracking List 

35 FIG. 5A illustrates an exemplary site tracking list accord- 
ing to one embodiment of the present invention. This 
illustration depicts a site tracking lisl 435 including site 
tracking list records 505 for three remote sites; (1) http:// 
www.companyA.com/; (2) http://www.companyB.com/; 

40 and (3) http://www.companyC.com/. In this embodiment, 
each site tracking list record 505 may include a list of one 
or more URL patterns 510. 

The list of URL patterns 510 may be a list of strings 
identifying the initial portions (e.g., prefixes) of URLs to be 

45 tracked. In this example, the proxy 400 tracks hits for 
documents identified by URLs with a prefix that matches 
any of the URL patterns 510 specified in one of the site 
tracking list records 505. The hits may then be logged to a 
record in the per site hit database 440 corresponding to the 

50 site tracking list record 505 which contained the matching 
URL pattern. This form of URL pattern is useful for tracking 
hits for a particular grouping of Web pages beginning with 
the same initial sequences of characters. For example, the 
URLs for the Web pages of Company A might all begin with 

55 "http://www.companyA.com/." Additionally, the Web pages 
associated with products produced by Company A might all 
begin with the sequence "http://www.companyA.com/ 
product/." Furthermore, pages related to a particular product 
might all begin with the URL prefix "http:// 

60 www.companyA.com/product/<product_name>/" where 
<product_name> identifies the particular product. To track 
the hits for pages relating to Company A's Gizmo product 
line, therefore, the following URL pattern may be used: 
"http://www.companyA.com/product/Gizmo/." Similarly, to 

65 track the hits for all of Company A's products the following 
URL pattern may be used: "http://www.companyA.com/ 
product/." 
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URL patterns are not limited to prefixes, other forms of 
URL patterns may be used such as patterns including wild 
card or other special characters, or patterns in the form of 
standard regular expressions. 

5 

An Exemplary per Site Hit Database 

FIG. 5B illustrates an exemplary per site hit database 
according to one embodiment of the present invention. 
Based upon the information provided in the site tracking list 
425 of FIG. 5A, an exemplary per site hit database might be 10 
represented as per site hit database 440. In this example, the 
per site hit database 440 includes three site hit records SIS 
corresponding to remote sites for CompanyA, CompanyB 
and CompanyC. 

In this embodiment, each site bit record 515 includes a 
timeslamp 525. The a timestamp 525 may indicate the lime 
from which the hits have been accumulated. In this example, 
therefore, there have been six hits to the monitored URLs 
since Jan. 16, 1997 at 10:01:58. Those of skill in the art will 
appreciate the timestamp 525 may represent the period of 
accumulation in other ways such as elapsed time since the 
last bit report was generated. 

Site hit records 515 also include a remote site name 530. 
The remote site names 530 from front to back correspond to 2 s 
CompanyA, CompanyB, and CompanyC. Site hit record 515 
further includes a list of hits 520. In this embodiment, the list 
of hits 520 includes the URLs of the documents that were 
requested and subsequently serviced from the proxy's local 
cache (e.g., document cache 465) since the time indicated by 30 
the timestamp 525. According to the site hit record 515 for 
CompanyA, the adl.html Web page has been requested and 
serviced from the proxy's local cache three times. Similarly, 
the sales.html and Ql.html Web pages have been provided 
from the proxy's cache once and twice, respectively. Based 35 
upon the accumulated hit information in a particular site hit 
record 515, a detailed hit report may be provided to the 
corresponding remote site administrator. Hit accumulation 
will be discussed further below. 

FIG. 6 is a logical view of an exemplary directory 40 
structure 600 that may exist on a remote server 4. This 
exemplary directory structure 600 illustrates the need for a 
flexible method of tracking the number of bits. Web pages 
might reside in any or all of the directories shown. In this 
example, the URL patterns within a site tracking list record 45 
505 may identify a particular directory or directories in the 
hierarchy depicted. 

The remote site administrator for CompanyA may want to 
know the number of hits in an Ads subdirectory 605 and an 
Events subdirectory 610. This may be due to the fact that so 
advertising banners are shown on Web pages in these 
directories and the advertisers may want feedback on how 
many Web viewers are seeing their ads. Alternatively, the 
company may have its own business reasons for analyzing 
statistics in certain areas of their Web site. Regardless, it is 55 
apparent that simply tracking all hits for a root directory 615 
on the company's server is insufficient. For example, hits 
would be tracked for directories in which the remote site 
administrator had no interest. A list of URL patterns is used 
to accommodate the flexibility desired. The following URL 60 
patterns may be stored in the site tracking list 425 for 
CompanyA to track the above-mentioned subdirectories: 
"http:/Avww.companyA.com/products/Events/" and "http:// 
www.companyA.com/products/Ads/." The list of URL pat- 
terns 510 in each site tracking list record 505 allows a 65 
remote site to enumerate specific directories, for example, in 
which they would like to track user hits. 
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The advantages of providing forms of URL patterns with 
wild cards becomes apparent with reference to the directory 
structure 600. Assume the "*" character is a wild card. That 
is, it matches zero or more characters. Since, CompanyA has 
two subdirectories with press releases, a convenient way to 
track hits in both is with the following URL pattern: "http:// 
www.companyA.com/*press_releasesA/." Without the use 
of a wild card, the equivalent URL patterns are as follows: 
"http://www.companyA.com/press_releases/" and "http:// 
www.companyA.com/products/press_releases." Thus, it 
should be appreciated that wild cards and regular expres- 
sions provide additional efficiency and convenience in the 
specification of URL patterns. 

Hit Accumulation 

FIG. 7 is a flow diagram illustrating a method of per- 
forming hit accumulation according to one embodiment of 
the present invention. In this embodiment, each site hit 
record 515 begins in an initial state having an indication of 
the remote site (e.g., the name 530) and a timestamp 525 
representing the time at which hit accumulation began. 
Initially, the hit accumulation server 415 waits for an indi- 
cation that a client request has been serviced from the 
proxy's local cache (step 710). For example, the hit accu- 
mulator processor 430 may determine that a new entry has 
been made to the hit log 420. 

Upon receiving an indication that the proxy 400 has 
served up a cached response, the hit accumulation server 415 
determines if the URL of the document retrieved from the 
proxy's local cache is one whose hits are to be tracked. As 
discussed above, not all hits are tracked. In this embodiment, 
hits are tracked only for documents matching URL patterns 
that have been registered in a tracking list such as the site 
tracking list 425, discussed above. Therefore, the hit accu- 
mulator processor 430 compares the URL of the retrieved 
document to URL patterns 510 in each site tracking list 
record 505 to determine if the hit will be recorded in the per 
site hit database 440 (step 720). If no URL patterns 510 
match the retrieved document the hit is ignored. Otherwise, 
if the retrieved document matches any of the URL patterns 
510, then the appropriate site hit record 515 in the per site 
hit database 440 is updated (step 730). 

Update of the site hit record 515 can be explained briefly 
with respect to FIG. SB In this embodiment, the appropriate 
site hit record 515 is searched for an entry that matches the 
URL of the retrieved document. If the retrieved document's 
URL does not already exist in the list of hits 520 for the site 
hit record 515, then the URL is added and its count is set to 
one since this is the document's first hit. However, if the 
retrieved document's URL was already in the list of hits 520 
(meaning it has had at least one previous hit), then only the 
corresponding count needs to be incremented. In this 
manner, each document retrieved from the proxy's local 
cache that matches a tracked URL pattern will have an entry 
in the list of hits 520 with a corresponding count indicating 
the number of cache bits. 

Hit Reporting 

Referring now to FIG. 8, a method of hit reporting 
according to one embodiment of the present invention is 
illustrated. In this embodiment of the present invention, the 
bit accumulator server 415, in addition to its other 
responsibilities, acts as a Web server providing an HTTP 
interface by which remote site administrators can access the 
accumulated hits for their respective tracked sites. The hit 
report processor 450 waits until a request is received from a 
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remote site administrator (step 810). Preferably, the HTTP Digest 5 (MD5) hash algorithm. The hash algorithm can be 

address on the hit accumulation server 415 can be used to thought of as a mechanism for assigning a range of URLs to 

identify the requester of the information. For example, the each of the proxy servers 405 in the proxy 900. 

hit report for Company A, might be accessed on the hit In this embodiment, the dispatcher 910 receives document 

accumulation server 415 at: "http://www.webtv.net/hits/ s requests including URLs from a client such as WebTV™ 

company_a." client 1. Based upon the URL in the request, the dispatcher 

To limit access to the hit reports a secure communication 910 determines the proxy server 405 in which the document 

technology such as Secure Sockets Layer (SSL) or other should ** Ca ^ d , and l ° mitis the dient request to that 

available secure communication protocol can be used to pr ° x / ^ If the doc ™ "quested by the cheat is 

. ., . .. . , .. • . r , ... . . , n not found in the proxy server s local document cache 465, 

keep the hit information private by providing encrypted »o .. r J . n , . . . c ., ' 

r , r , '. . ' T , then the proxy server 405 requests the document from the 

communications across the network. Additionally, the report >pp|0pril 5 e ^ (e . fr a ^ mote xrvei) ^ adua lhe 

requests can be authenticated to assure only a particular document when it ^ r6C6ived from the xrrctm 

remote server or individual can access the information (step , f rcdundancy ^ dcsired> the hashed rcsuh of , URL may 

be used to identify a cluster of two or more proxy servers 

Once a request has been received from a remote site r ather than a single proxy server 405. In this manner, the 

administrator and it has been optionally authenticated, then i oad required to support a popular document can be shared 

a report can be generated from the hit data accumulated such among a group of proxy servers. 

as the list of hits 520 for the particular site hit record 515 In an a i teraa ,j ve embodiment, a decentralized dispatching 

(step 830). In this embodiment, the report may include a list sc heme can be implemented. For example, the proxy servers 

of URLs and their corresponding counts since the last report. 405 may te arrarjged t0 form aD interconnected ring con- 

For convenient access via the Web, the report may be figuration and the functionality of the dispatcher 910 may be 

formatted in an HTML format. Also, for the convenience of incorporated into each proxy server 405. In this 

the remote site administrator, a timestamp that identifies the embodiment, the client documenl requests may be initially 

starting point of the accumulation may be included in the handled by one of the proxy servers 405 in the ring. If the 

report. The level of specificity of the URL list may be at the requested document is not found in the local cache of the 

document level thereby allowing the remote site adminis- initial proxy server, the initial proxy server may forward the 

trator to determine the number of hits for individual docu- request to the appropriate proxy server based on the hashing 

ments. However, it may also be helpful to additionally scheme discussed above. 

summarize the hits by directory, for example. It will be 3or _FIG.-10-is-a_flow_diagram_illustrating J /a meihocTofj 

recognized that numerous other ways of formatting and / dispatching requests to segregate the storage' : -of 'documents! 

arranging the hit reports are possible. Seeing To one" eWboSiraVnf of the present invention! 

After the report has been formatted, the response contain- While'both^centralized-and-a-decentralized-request-dis"^ 

ing the report is transmitted to the remote site administrator patching mechanism have been discussed above, the method 

(step 840). 35 described below is generally applicable to both. In this 

In this embodiment, before resuming the hit accumulation embodiment, initially, a document request is received from 

of FIG. 7, the accumulated data in the site hit record 515 is a client (step 1010). If a centralized dispatcher such as 

cleared (step 850) also the timestamp 525 is reset to reflect dispatcher 910 receives the request, then based upon the 

the current time. The above steps for retrieving a report from URL an appropriate proxy server is determined based upon 

the proxy may be periodically repeated at the convenience of 40 the output of the hash algorithm (step 1020). 

the remote site administrator whenever an accurate total hit However, in a decentralized dispatching environment, the 

count is desired. initial proxy server receiving the client request may assume 

In alternative embodiments, hit reports may be provided it is the appropriate prox y server and first ch eck its local 

to remote sites in a number of other ways. Hit reports need document ^^^&/"^S^.4s^S^^^M^^g!t,^^ 
not be initiated by a request from the remote site adminis- 



trator For example, the proxy may periodically send I unso- ^.^^^^^ 
licited hit reports via email, the proxy may periodically pnate for the request (step 1020)f 
download hit updates to a device specified by the remote site Mter — determining the proxy server appropriate for the 
administrator, or the hit reports might be transmitted to client request, the request is forwarded to that proxy server 
remote site administrators in the form of datagrams. In any 50 (step 1030). The proxy server 405 attempts to service the 
event, the assignees of the present invention appreciate a request from its local document cache 465. If a cache hit 
variety of reporting mechanisms are possible. occurs, then the document is immediately available from-the 

proxy_server^s_lp^l_d^c3iment_cache_465jHowewr, if & 
Allocation of Cache Space within a Proxy fcache. miss occurs, the proxy server 405 wS'rSneve" th* 



FIG. 9 is a data flow diagram illustrating the interaction 



55 rdocu^n^^ffom'"*an s ' appropriate server ahd"sYore a copy7 
Mocally* lirany *eve^t"/lhe , cenu^Uze{l''or'decenrraUzed dis* 



of proxy components according to another embodiment of 

the present invention. In this embodiment, proxy 900 'patcAtogmechanismulnmately receivesaresponsefrom the 
includes a plurality of proxy servers 405 communicatively server (e.g., the documenl requested by the client) (step 
coupled to a dispatcher 910 and a hit accumulator server 1040). Finally, the response, typically in the form of an 
415. Rather than allowing a given proxy server's cached 60 HTML document is forwarded to the client (step 1050). This 
contents to be determined based upon the requests of an method of caching documents segregates the content of the 
associated client, the content of the Web can be distributed Web based upon the URL of the documents. Since each URL 
among proxy servers 405 by a hash algorithm executed by will map to only one proxy server 405, advantageously this 
the dispatcher 910. The hash algorithm preferably maps a approach more efficiently allocates the proxy's cache space 
given URL to one and only one of the plurality of proxy 65 by avoiding unnecessary redundancy, 
servers 405. This can be accomplished using a portion of the In the foregoing specification, the invention has been 
output of a secure hash algorithm such as the Message described with reference to specific embodiments thereof. It 
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will, however, be evident that various modifications and rather than allowing any arbitrary proxy server to obtain 

changes may be made thereto without departing from the and cache the requested content from the particular 

broader spirit and scope of the invention. The specification Web page identified, mapping the URL of the particular 

and drawings are, accordingly, to be regarded in an illus- Web page to a particular one and only one of a plurality 

tralive rather than a restrictive sense. 5 of mutually exclusive ranges of URLs that are distrib- 

What is claimed and desired to be secured by United uted among the plurality of proxy servers; and 

States Letters Patent is: at a proxy server assigned the particular mutually exclu- 

1. In a networked computer system such as the Internet ?' ve ran 8 e 10 which the URLof the particular Web page 

that includes a plurality of remote servers, a plurality of k mapped, searching for the requested content in a 

proxy servers and a plurality of client systems, all of which to local cache > and tf ,ne requested content is found in' the 

are logically interconnected so that the client systems can local cache ' returning » «° the client system from which 

access informational content stored at the one or more . e rco ; ucst was received, and if the requested content 

remote servers, and wherein at least one of the client systems 15 not JT ' obtaimn g requested content from 

is comprised of an electronics unit which provides a graphi- 0ne ° th ? re / note se r vers and s ! onn S 11 m ' he lc ? cal 

cal user interface by which the Internet can be accessed and 15 W '^f d C0DteDt to , ,he CheDt 

. , ' ..... ... system from which the request was received. 

browsed using a conventional television set as a display a 3 . A melhod „ reciled in ^ , or claim 2 wherein , he 

method of efficiently allocating cache space within the step for mapping identifics a cluster of ^ or more 

plurahty of proxy servers so that requested content from one servers collectively assigned the particular one of the plu- 

or more Web pages is distnbutively cached at mutually ra ]ity of mutually exclusive ranges of URLs, the method 

exclusive proxy servers, comprising steps for: 20 further comprising a step for storing requested content that 

dividing responsibility for obtaining and caching content is retrieved from one of the remote servers in the local cache 

among a plurality of proxy servers, wherein at least two °f eacD proxy server in the cluster. 

proxy servers are responsible for obtaining and caching 4 - A method as recited in claim 3, further comprising a 

mutually exclusive content; ste P f° r distributing multiple requests for the requested 

receiving a request for downloading content from a par- 25 CO f a' ai "°^ tbe tw ° or more proxy servers of the cluster. 

ticular Web page identified by a uniform resource J n 1™°™^"^ 1 m. 7,1 " T . ' W w k * ' 

locator P'URI "V p mapping the URL of the particular Web page 

" comprises the act of applying a hash algorithm to the URL. 

rather than allowing any arbitrary proxy server to obtain 6. A method as recited in claim 5, wherein the hash 

and cache the requested content from the particular 3Q algorithm comprises a Message Digest 5 algorithm. 

Web page identified, mapping the URL of the particular 7. A method as recited in claim 1 or claim 2, wherein a 

Web page to a particular one and only one of a plurality central dispatcher receives the request for downloading 

of mutually exclusive ranges of URLs that are distrib- content and maps the URL of the particular Web page, the 

uted among the plurality of proxy servers; and method further comprising a step for forwarding the request 

at a proxy server assigned the particular mutually exclu- 35 to the assigned proxy server, 

sive range to which the URLof the particular Web page 8. A method as recited in claim 1 or claim 2, wherein an 

is mapped, searching for the requested content in a initial proxy server receives the request for downloading 

local cache, and if the requested content is found in the content, the method further comprising a step for forwarding 

local cache, returning it to the client system from which the request to the assigned proxy server after the initial 

the request was received, and if the requested content w proxy server searches for the requested content in a local 

is not found, then obtaining the requested content from cache and the requested content is not found, 

one of the remote servers and storing it in the local 9. In a networked computer system such as die Internet 

cache, and returning the requested content to the client that includes a plurality of remote servers, a plurality of 

system from which the request was received. proxy servers and a plurality of client systems, all of which 

2. In a networked computer system such as the Internet 45 are logically interconnected so that the client systems can 
that includes a plurality of remote servers, a plurality of access informational content stored at the one or more 
proxy servers and a plurality of client systems, all of which remote servers, and wherein at least one of the client systems 
are logically interconnected so that the client systems can is comprised of an electronics unit which provides a graphi- 
access informational content stored at the one or more cal user interface by which the Internet can be accessed and 
remote servers, and wherein at least one of the client systems J0 browsed using a conventional television set as a display, a 
is comprised of an electronics unit which provides a graphi- method of efficiently allocating cache space within the 
cal user interface by which the Internet can be accessed and plurahty of proxy servers so that requested content from one 
browsed using a conventional television set as a display, a or more Web pages is distnbutively cached at mutually 
computer program product for implementing a method of exclusive proxy servers, comprising acts of: 

efficiently allocating cache space within the plurality of 5S assigning to each of a plurality of proxy servers a mutu- 

proxy servers so that requested content from one or more ally exclusive range of uniform resource locators 

Web pages is distributively cached at mutually exclusive ("URL"); 

proxy servers, comprising a computer readable medium for hashing a URL, received as part of a request from a client 

storing executable instructions for implementing the system for downloading content from a particular Web 

method, and wherein the method comprises steps for: 60 page; 

dividing responsibility for obtaining and caching content rather than allowing any arbitrary proxy server to retrieve 

among a plurality of proxy servers, wherein at least two and cache the requested content, identifying, with at 

proxy servers are responsible for obtaining and caching least a portion of the hashed URL, at least one proxy 

mutually exclusive content; server assigned to a particular mutually exclusive range 

receiving a request for downloading content from a par- 65 of URI^s that corresponds to the requested content; and 

ticular Web page identified by a uniform resource at the at least one identified proxy server, examining a 

locator ("URL"); local cache for the requested content, and if found in 
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the local cache, sending the requested content to the 
client system, but if not found in the local cache, 
retrieving the requested content from one of the remote 
servers, storing the requested content in the local cache, 
and sending the requested content to the client system. 5 
10. In a networked computer system such as the Internet 
that includes a plurality of remote servers, a plurality of 
proxy servers and a plurality of client systems, all of which 
are logically interconnected so that the client systems can 
access informational content stored at the one or more to 
remote servers, and wherein at least one of the client systems 
is comprised of an electronics unit which provides a graphi- 
cal user interface by which the Internet can be accessed and 
browsed using a conventional television set as a display, a 
computer program product for implementing a method of is 
efficiently allocating cache space within the plurality of 
proxy servers so that requested content from one or more 
Web pages is distributively cached at mutually exclusive 
proxy servers, comprising a computer readable medium for 
storing executable instructions for implementing the 20 
method, and wherein the method comprises acts of 

assigning to each of a plurality of the proxy servers a 
mutually exclusive range of uniform resource locators 
("URLs"); 

hashing a URL, received as part of a request from a client 25 
system for downloading content from a particular Web 
page; 

rather than allowing any arbitrary proxy server to retrieve 
and cache the requested content, identifying, with at 3Q 
least a portion of the hashed URL, at least one proxy 
server assigned to a particular mutually exclusive range 
of URLs that corresponds to the requested content; and 
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at the at least one identified proxy server, examining a 
local cache for the requested content, and if found in 
the local cache, sending the requested content to the 
client system, but if not found in the local cache, 
retrieving the requested content from one of the remote 
servers, storing the requested content in the local cache, 
and sending the requested content to the client system. 

11. A method as recited in claim 9 or claim 10, wherein 
the act of hashing identifies a cluster of two or more proxy 
servers collectively assigned to the particular mutually 
exclusive range of URLs that corresponds to the requested 
content, the method further comprising an act of adding 
requested content that is received from one of the remote 
servers to the local cache of each proxy server in the cluster. 

12. A method as recited in claim 11, further comprising an 
act of load balancing multiple requests for the requested 
content among the two or more proxy servers of the cluster. 

13. A method as recited in claim 9 or claim 10, wherein 
the hashing comprises a Message Digest 5 algorithm. 

14. A method as recited in claim 9 or claim 10, wherein 
a central dispatcher receives the request for downloading 
content and hashes the received URL, the method further 
comprising an act of sending the request to the at least one 
identified proxy server. 

15. A method as recited in claim 9 or claim 10, wherein 
an initial proxy server receives the request for downloading 
content, the method further comprising an act of sending the 
request to the at least one identified proxy server after the 
initial proxy server searches for the requested content in a 
local cache and the requested content is not found. 

***** 
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